[jira] [Comment Edited] (SPARK-19490) Hive partition columns are case-sensitive
[ https://issues.apache.org/jira/browse/SPARK-19490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595852#comment-16595852 ] Harish edited comment on SPARK-19490 at 8/29/18 2:19 AM: - I see the same issue in 2.3.1. After changing the column to lower case(in select statement) then join works fine was (Author: harishk15): I > Hive partition columns are case-sensitive > - > > Key: SPARK-19490 > URL: https://issues.apache.org/jira/browse/SPARK-19490 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Major > > The real partitions columns are lower case (year, month, day) > {code} > Caused by: java.lang.RuntimeException: Expected only partition pruning > predicates: (concat(YEAR#22, MONTH#23, DAY#24) = 20170202) > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:985) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:976) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) > at > org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:976) > at > org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(MetastoreRelation.scala:161) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:151) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:150) > at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2472) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:149) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:124) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:42) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:85) > at > org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:213) > at > org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:261) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:117) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > {code} > Use these sql can reproduce this bug: > CREATE TABLE partition_test (key Int) partitioned by (date string) > SELECT * FROM partition_test where DATE = '20170101' -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19490) Hive partition columns are case-sensitive
[ https://issues.apache.org/jira/browse/SPARK-19490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595852#comment-16595852 ] Harish commented on SPARK-19490: I > Hive partition columns are case-sensitive > - > > Key: SPARK-19490 > URL: https://issues.apache.org/jira/browse/SPARK-19490 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: cen yuhai >Priority: Major > > The real partitions columns are lower case (year, month, day) > {code} > Caused by: java.lang.RuntimeException: Expected only partition pruning > predicates: (concat(YEAR#22, MONTH#23, DAY#24) = 20170202) > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:985) > at > org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:976) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95) > at > org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:976) > at > org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(MetastoreRelation.scala:161) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:151) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$10.apply(HiveTableScanExec.scala:150) > at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2472) > at > org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:149) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:235) > at > org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:124) > at > org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:42) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:141) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:368) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:85) > at > org.apache.spark.sql.execution.exchange.ExchangeCoordinator.doEstimationIfNecessary(ExchangeCoordinator.scala:213) > at > org.apache.spark.sql.execution.exchange.ExchangeCoordinator.postShuffleRDD(ExchangeCoordinator.scala:261) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:117) > at > org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:112) > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > {code} > Use these sql can reproduce this bug: > CREATE TABLE partition_test (key Int) partitioned by (date string) > SELECT * FROM partition_test where DATE = '20170101' -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24765) Add custom Kubernetes scheduler config parameter to spark-submit
[ https://issues.apache.org/jira/browse/SPARK-24765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Harish resolved SPARK-24765. -- Resolution: Information Provided Issue is addressed by https://issues.apache.org/jira/browse/SPARK-24434 > Add custom Kubernetes scheduler config parameter to spark-submit > - > > Key: SPARK-24765 > URL: https://issues.apache.org/jira/browse/SPARK-24765 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Nihal Harish >Priority: Minor > > spark submit currently does not accept any config parameter that can enable > the driver and executor pods to be scheduled by a custom scheduler as opposed > to just the default-scheduler. > I propose the addition of a new config parameter: > spark.kubernetes.schedulerName > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24765) Add custom Kubernetes scheduler config parameter to spark-submit
[ https://issues.apache.org/jira/browse/SPARK-24765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nihal Harish updated SPARK-24765: - Description: spark submit currently does not accept any config parameter that can enable the driver and executor pods to be scheduled by a custom scheduler as opposed to just the default-scheduler. I propose the addition of a new config parameter: spark.kubernetes.schedulerName was: spark submit currently does not accept any config parameter that can enable the driver and executor pods to be scheduled by a custom scheduler as opposed to just the default-scheduler. I propose the addition of three new config parameters: spark.kubernetes.schedulerName spark.kubernetes.driver.schedulerName spark.kubernetes.executor.schedulerName > Add custom Kubernetes scheduler config parameter to spark-submit > - > > Key: SPARK-24765 > URL: https://issues.apache.org/jira/browse/SPARK-24765 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.3.1 >Reporter: Nihal Harish >Priority: Minor > > spark submit currently does not accept any config parameter that can enable > the driver and executor pods to be scheduled by a custom scheduler as opposed > to just the default-scheduler. > I propose the addition of a new config parameter: > spark.kubernetes.schedulerName > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24765) Add custom Kubernetes scheduler config parameter to spark-submit
Nihal Harish created SPARK-24765: Summary: Add custom Kubernetes scheduler config parameter to spark-submit Key: SPARK-24765 URL: https://issues.apache.org/jira/browse/SPARK-24765 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 2.3.1 Reporter: Nihal Harish spark submit currently does not accept any config parameter that can enable the driver and executor pods to be scheduled by a custom scheduler as opposed to just the default-scheduler. I propose the addition of three new config parameters: spark.kubernetes.schedulerName spark.kubernetes.driver.schedulerName spark.kubernetes.executor.schedulerName -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-18681) Throw Filtering is supported only on partition keys of type string exception
[ https://issues.apache.org/jira/browse/SPARK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-18681: --- Comment: was deleted (was: [~michael] I see the same issue in 2.3.1. I have hive partitioned table with integer as partition key. I set ("spark.sql.hive.manageFilesourcePartitions","false") in my pyspark code. Caused by: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:741) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:655) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:653) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:653) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1218) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1211) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1211) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions$lzycompute(HiveTableScanExec.scala:172) at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions(HiveTableScanExec.scala:164) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:190) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:190) at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2515) at org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:189) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 38 more Caused by: java.lang.reflect.InvocationTargetException at
[jira] [Commented] (SPARK-18681) Throw Filtering is supported only on partition keys of type string exception
[ https://issues.apache.org/jira/browse/SPARK-18681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16523049#comment-16523049 ] Harish commented on SPARK-18681: [~michael] I see the same issue in 2.3.1. I have hive partitioned table with integer as partition key. I set ("spark.sql.hive.manageFilesourcePartitions","false") in my pyspark code. Caused by: java.lang.RuntimeException: Caught Hive MetaException attempting to get partition metadata by filter from Hive. You can set the Spark configuration setting spark.sql.hive.manageFilesourcePartitions to false to work around this problem, however this will result in degraded performance. Please report a bug: https://issues.apache.org/jira/browse/SPARK at org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:741) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:655) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:653) at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:272) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:210) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:209) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:255) at org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:653) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1218) at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1211) at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97) at org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1211) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925) at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions$lzycompute(HiveTableScanExec.scala:172) at org.apache.spark.sql.hive.execution.HiveTableScanExec.rawPartitions(HiveTableScanExec.scala:164) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:190) at org.apache.spark.sql.hive.execution.HiveTableScanExec$$anonfun$11.apply(HiveTableScanExec.scala:190) at org.apache.spark.util.Utils$.withDummyCallSite(Utils.scala:2515) at org.apache.spark.sql.hive.execution.HiveTableScanExec.doExecute(HiveTableScanExec.scala:189) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:371) at org.apache.spark.sql.execution.FilterExec.inputRDDs(basicPhysicalOperators.scala:121) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.inputRDDs(BroadcastHashJoinExec.scala:76) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:41) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:605) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.prepareShuffleDependency(ShuffleExchangeExec.scala:92) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:128) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec$$anonfun$doExecute$1.apply(ShuffleExchangeExec.scala:119) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 38 more Caused by: java.lang.reflect.InvocationTargetException at
[jira] [Comment Edited] (SPARK-19243) Error when selecting from DataFrame containing parsed data from files larger than 1MB
[ https://issues.apache.org/jira/browse/SPARK-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995489#comment-15995489 ] Harish edited comment on SPARK-19243 at 5/3/17 7:52 PM: i am getting the same error in spark 2.1.0. I have 10 node cluster with 109GB each. My data set is just 30K rows with 60 columns. I see total 72 partitions after loading the orc file to DF. then re-partitioned to 2001. No luck. [~srowen] did any one raised the similar issue? Regards, Harish was (Author: harishk15): i am getting the same error in spark 2.1.0. I have 10 node cluster with 109GB each. My data set is just 30K rows with 60 columns. I see total 72 partitions after loading the orc file to DF. then re-partitioned to 2001. No luck. Regards, Harish > Error when selecting from DataFrame containing parsed data from files larger > than 1MB > - > > Key: SPARK-19243 > URL: https://issues.apache.org/jira/browse/SPARK-19243 > Project: Spark > Issue Type: Bug >Reporter: Ben > > I hope I can describe the problem clearly. This error happens with Spark > 2.0.1. However, I tried with Spark 2.1.0 on my test PC and it worked there, > none of the issues below, but I can't try it on the test cluster because > Spark needs to be upgraded there. I'm opening this ticket because if it's a > bug, maybe something is still partially present in Spark 2.1.0. > Initially I though it was my script's problem so I tried to debug, until I > found why this is happening. > Step by step, I load XML files through spark-xml into a DataFrame. In my > case, the rowTag is the root tag, so each XML file creates a row. The XML > structure is fairly complex, which are converted to nested columns or arrays > inside the DF. Since I need to flatten the whole table, and since the output > is not fixed but I dynamically select what I want as output, in case I need > to output columns that have been parsed as arrays, then I explode them with > explode() only when needed. > Normally I can select various columns that don't have many entries without a > problem. > I select a column that has a lot of entries into a new DF, e.g. simply through > {noformat} > df2 = df.select(...) > {noformat} > and then if I try to do a count() or first() or anything, Spark behaves two > ways: > 1. If the source file was smaller than 1MB, it works. > 2. If the source file was larger than 1MB, the following error occurs: > {noformat} > Traceback (most recent call last): > File \"/myCode.py\", line 71, in main > df.count() > File > \"/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py\", > line 299, in count > return int(self._jdf.count()) > File > \"/usr/hdp/current/spark-client/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py\", > line 1133, in __call__ > answer, self.gateway_client, self.target_id, self.name) > File > \"/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/utils.py\", > line 63, in deco > return f(*a, **kw) > File > \"/usr/hdp/current/spark-client/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py\", > line 319, in get_return_value > format(target_id, \".\", name), value) > Py4JJavaError: An error occurred while calling o180.count. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 > (TID 6, compname): java.lang.IllegalArgumentException: Size exceeds > Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:103) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1307) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) > at > org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:438) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:606) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:663) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-19243) Error when selecting from DataFrame containing parsed data from files larger than 1MB
[ https://issues.apache.org/jira/browse/SPARK-19243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995489#comment-15995489 ] Harish commented on SPARK-19243: i am getting the same error in spark 2.1.0. I have 10 node cluster with 109GB each. My data set is just 30K rows with 60 columns. I see total 72 partitions after loading the orc file to DF. then re-partitioned to 2001. No luck. Regards, Harish > Error when selecting from DataFrame containing parsed data from files larger > than 1MB > - > > Key: SPARK-19243 > URL: https://issues.apache.org/jira/browse/SPARK-19243 > Project: Spark > Issue Type: Bug >Reporter: Ben > > I hope I can describe the problem clearly. This error happens with Spark > 2.0.1. However, I tried with Spark 2.1.0 on my test PC and it worked there, > none of the issues below, but I can't try it on the test cluster because > Spark needs to be upgraded there. I'm opening this ticket because if it's a > bug, maybe something is still partially present in Spark 2.1.0. > Initially I though it was my script's problem so I tried to debug, until I > found why this is happening. > Step by step, I load XML files through spark-xml into a DataFrame. In my > case, the rowTag is the root tag, so each XML file creates a row. The XML > structure is fairly complex, which are converted to nested columns or arrays > inside the DF. Since I need to flatten the whole table, and since the output > is not fixed but I dynamically select what I want as output, in case I need > to output columns that have been parsed as arrays, then I explode them with > explode() only when needed. > Normally I can select various columns that don't have many entries without a > problem. > I select a column that has a lot of entries into a new DF, e.g. simply through > {noformat} > df2 = df.select(...) > {noformat} > and then if I try to do a count() or first() or anything, Spark behaves two > ways: > 1. If the source file was smaller than 1MB, it works. > 2. If the source file was larger than 1MB, the following error occurs: > {noformat} > Traceback (most recent call last): > File \"/myCode.py\", line 71, in main > df.count() > File > \"/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/dataframe.py\", > line 299, in count > return int(self._jdf.count()) > File > \"/usr/hdp/current/spark-client/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py\", > line 1133, in __call__ > answer, self.gateway_client, self.target_id, self.name) > File > \"/usr/hdp/current/spark-client/python/lib/pyspark.zip/pyspark/sql/utils.py\", > line 63, in deco > return f(*a, **kw) > File > \"/usr/hdp/current/spark-client/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py\", > line 319, in get_return_value > format(target_id, \".\", name), value) > Py4JJavaError: An error occurred while calling o180.count. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 > (TID 6, compname): java.lang.IllegalArgumentException: Size exceeds > Integer.MAX_VALUE > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:869) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:103) > at > org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:91) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1307) > at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:105) > at > org.apache.spark.storage.BlockManager.getLocalValues(BlockManager.scala:438) > at org.apache.spark.storage.BlockManager.get(BlockManager.scala:606) > at > org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:663) > at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:281) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at
[jira] [Created] (SPARK-20362) spark submit not considering user defined Configs (Pyspark)
Harish created SPARK-20362: -- Summary: spark submit not considering user defined Configs (Pyspark) Key: SPARK-20362 URL: https://issues.apache.org/jira/browse/SPARK-20362 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.1.0 Reporter: Harish I am trying to set up the custom configuration on runtime (pyspark), but in my spark UI :8080 i see my job is using complete node/cluster resources and application name is "test.py"(which is script name). It looks like the user defined configurations are not considered in job submit. command : spark-submit test.py standalone mode(2 nodes and 1 master) Here is the code: test.py from pyspark.sql import SparkSession from pyspark import SparkConf if __name__ == "__main__": conf = SparkConf().setAll([('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.cores.max', '10'), ('spark.driver.memory','8g')]) spark = SparkSession.builder.config(conf=conf).enableHiveSupport().getOrCreate() sc = spark.sparkContext print(sc.getConf().getAll()) sqlContext = SQLContext(sc) hiveContext = HiveContext(sc) print(hiveContext) print(sc.getConf().getAll()) print("Complete") Print: [('spark.jars.packages', 'com.databricks:spark-csv_2.11:1.2.0'), ('spark.local.dir', '/mnt/sparklocaldir/'), ('hive.metastore.warehouse.dir', ''), ('spark.app.id', 'app-20170417221942-0003'), ('spark.jars', 'file:/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,file:/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.executor.id', 'driver'), ('spark.app.name', 'test.py'), ('spark.cores.max', '10'), ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'), ('spark.driver.port', '35596'), ('spark.sql.catalogImplementation', 'hive'), ('spark.sql.warehouse.dir', ''), ('spark.rdd.compress', 'True'), ('spark.driver.memory', '8g'), ('spark.serializer.objectStreamReset', '100'), ('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/home/user/test.py,file:/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,file:/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.master', 'spark://master:7077'), ('spark.submit.pyFiles', '/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.driver.host', 'master')] [('spark.jars.packages', 'com.databricks:spark-csv_2.11:1.2.0'), ('spark.local.dir', '/mnt/sparklocaldir/'), ('hive.metastore.warehouse.dir', ''), ('spark.app.id', 'app-20170417221942-0003'), ('spark.jars', 'file:/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,file:/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.executor.id', 'driver'), ('spark.app.name', 'test.py'), ('spark.cores.max', '10'), ('spark.serializer', 'org.apache.spark.serializer.KryoSerializer'), ('spark.driver.port', '35596'), ('spark.sql.catalogImplementation', 'hive'), ('spark.sql.warehouse.dir', ''), ('spark.rdd.compress', 'True'), ('spark.driver.memory', '8g'), ('spark.serializer.objectStreamReset', '100'), ('spark.executor.memory', '8g'), ('spark.executor.cores', '3'), ('spark.submit.deployMode', 'client'), ('spark.files', 'file:/home/user/test.py,file:/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,file:/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,file:/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.master', 'spark://master:7077'), ('spark.submit.pyFiles', '/home/user/.ivy2/jars/com.databricks_spark-csv_2.11-1.2.0.jar,/home/user/.ivy2/jars/org.apache.commons_commons-csv-1.1.jar,/home/user/.ivy2/jars/com.univocity_univocity-parsers-1.5.1.jar'), ('spark.driver.host', 'master')] -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20022) java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory
[ https://issues.apache.org/jira/browse/SPARK-20022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15933121#comment-15933121 ] Harish commented on SPARK-20022: Isn't it same as "https://issues.apache.org/jira/browse/SPARK-14363; I see same log in that thread. Correct me if i am wrong. > java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory > -- > > Key: SPARK-20022 > URL: https://issues.apache.org/jira/browse/SPARK-20022 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.2 >Reporter: Harish > > I am getting below error in 2.0.2. Any help? or work around? > WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): > java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 > at > org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) > at > org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20022) java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory
[ https://issues.apache.org/jira/browse/SPARK-20022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-20022: --- Description: I am getting below error in 2.0.2. Any help? or work around? WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) was: I am getting below error in 2.0.2. Any help? WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at
[jira] [Updated] (SPARK-20022) java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory
[ https://issues.apache.org/jira/browse/SPARK-20022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-20022: --- Description: I am getting below error in 2.0.2. Any help? WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) was: I am getting below error in 2.0.2. Any help? WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at
[jira] [Created] (SPARK-20022) java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory
Harish created SPARK-20022: -- Summary: java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory Key: SPARK-20022 URL: https://issues.apache.org/jira/browse/SPARK-20022 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.2 Reporter: Harish I am getting below error in 2.0.2. Any help? WARN TaskSetManager: Lost task 34.0 in stage 2007.0 (TID 498115, <>): java.lang.OutOfMemoryError: Unable to acquire 4228 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:129) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:377) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:399) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:94) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:175) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:98) at org.apache.spark.sql.execution.SortExec$$anonfun$1.apply(SortExec.scala:91) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:803) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929259#comment-15929259 ] Harish commented on SPARK-18789: When you create the DF (dynamic) withough knowing the type of the column then you cant define the schema. In my case i am not knowing the type of a column. When you dont define the column type and if the entire column in None then i am getting this error message. i hope i am clear. > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at >
[jira] [Commented] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15928926#comment-15928926 ] Harish commented on SPARK-18789: In your example you are defining the schema first and then loading the data. Which works. Try to create the DF without defining the schema (column type). > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
[jira] [Updated] (SPARK-18789) Save Data frame with Null column-- exception
[ https://issues.apache.org/jira/browse/SPARK-18789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-18789: --- Summary: Save Data frame with Null column-- exception (was: Save Data frame with Null column exception) > Save Data frame with Null column-- exception > > > Key: SPARK-18789 > URL: https://issues.apache.org/jira/browse/SPARK-18789 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 >Reporter: Harish > > I am trying to save a DF to HDFS which is having 1 column is NULL(no data). > col1 col2 col3 > a 1 null > b 1 null > c1null > d 1 null > code : df.write.format("orc").save(path, mode='overwrite') > Error: > java.lang.IllegalArgumentException: Error: type expected at the position 49 > of 'string:string:string:double:string:double:string:null' but 'null' is > found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 > times; aborting job > 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. > org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in > stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage > 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: > type expected at the position 49 of > 'string:string:string:double:string:double:string:null' but 'null' is found. > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) > at > org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) > at > org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) > at > org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) > at > org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) > at > org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) > at > org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) > at > org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) > at >
[jira] [Created] (SPARK-18789) Save Data frame with Null column exception
Harish created SPARK-18789: -- Summary: Save Data frame with Null column exception Key: SPARK-18789 URL: https://issues.apache.org/jira/browse/SPARK-18789 Project: Spark Issue Type: Bug Affects Versions: 2.0.2 Reporter: Harish I am trying to save a DF to HDFS which is having 1 column is NULL(no data). col1 col2 col3 a 1 null b 1 null c1null d 1 null code : df.write.format("orc").save(path, mode='overwrite') Error: java.lang.IllegalArgumentException: Error: type expected at the position 49 of 'string:string:string:double:string:double:string:null' but 'null' is found. at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) at org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) at org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) at org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/12/08 19:41:49 ERROR TaskSetManager: Task 17 in stage 512.0 failed 4 times; aborting job 16/12/08 19:41:49 ERROR InsertIntoHadoopFsRelationCommand: Aborting job. org.apache.spark.SparkException: Job aborted due to stage failure: Task 17 in stage 512.0 failed 4 times, most recent failure: Lost task 17.3 in stage 512.0 (TID 37290, 10.63.136.108): java.lang.IllegalArgumentException: Error: type expected at the position 49 of 'string:string:string:double:string:double:string:null' but 'null' is found. at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:348) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:392) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305) at org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:765) at org.apache.hadoop.hive.ql.io.orc.OrcSerde.initialize(OrcSerde.java:104) at org.apache.spark.sql.hive.orc.OrcSerializer.(OrcFileFormat.scala:182) at org.apache.spark.sql.hive.orc.OrcOutputWriter.(OrcFileFormat.scala:225) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anon$1.newInstance(OrcFileFormat.scala:94) at org.apache.spark.sql.execution.datasources.BaseWriterContainer.newOutputWriter(WriterContainer.scala:131) at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:247) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:86) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at
[jira] [Updated] (SPARK-18496) java.lang.AssertionError: assertion failed
[ https://issues.apache.org/jira/browse/SPARK-18496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-18496: --- Affects Version/s: 2.0.2 > java.lang.AssertionError: assertion failed > -- > > Key: SPARK-18496 > URL: https://issues.apache.org/jira/browse/SPARK-18496 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.2 > Environment: 2.0.2 snapshot >Reporter: Harish > > I am getting this error when i store the estimates from Julia output to a DF > and then i do df.cache() > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 > in stage 177.0 failed 4 times, most recent failure: Lost task 0.3 in stage > 177.0 (TID 9722, 10.63.136.108): java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:156) > at > org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) > at > org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361) > at > org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356) > at > org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Driver stacktrace: > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) > at scala.Option.foreach(Option.scala:257) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916) > at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:441) > at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) > at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) > at py4j.Gateway.invoke(Gateway.java:280) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at
[jira] [Created] (SPARK-18496) java.lang.AssertionError: assertion failed
Harish created SPARK-18496: -- Summary: java.lang.AssertionError: assertion failed Key: SPARK-18496 URL: https://issues.apache.org/jira/browse/SPARK-18496 Project: Spark Issue Type: Bug Environment: 2.0.2 snapshot Reporter: Harish I am getting this error when i store the estimates from Julia output to a DF and then i do df.cache() py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 177.0 failed 4 times, most recent failure: Lost task 0.3 in stage 177.0 (TID 9722, 10.63.136.108): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) at org.apache.spark.storage.BlockInfo.readerCount_$eq(BlockInfoManager.scala:66) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:362) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2$$anonfun$apply$2.apply(BlockInfoManager.scala:361) at scala.Option.foreach(Option.scala:257) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:361) at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$2.apply(BlockInfoManager.scala:356) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:356) at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:646) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:281) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1442) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1441) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1441) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1667) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1622) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1611) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916) at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:441) at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:156) at org.apache.spark.storage.BlockInfo.checkInvariants(BlockInfoManager.scala:84) at
[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15665010#comment-15665010 ] Harish commented on SPARK-16845: This issue is still open - https://github.com/apache/spark/pull/15480 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > - > > Key: SPARK-16845 > URL: https://issues.apache.org/jira/browse/SPARK-16845 > Project: Spark > Issue Type: Bug > Components: Java API, ML, MLlib >Affects Versions: 2.0.0 >Reporter: hejie > Attachments: error.txt.zip > > > I have a wide table(400 columns), when I try fitting the traindata on all > columns, the fatal error occurs. > ... 46 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > at org.codehaus.janino.CodeContext.write(CodeContext.java:854) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15646453#comment-15646453 ] Harish commented on SPARK-17463: I was able to figure out the issue, its not related to this bug. > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at
[jira] [Created] (SPARK-18238) WARN Executor: 1 block locks were not released by TID
Harish created SPARK-18238: -- Summary: WARN Executor: 1 block locks were not released by TID Key: SPARK-18238 URL: https://issues.apache.org/jira/browse/SPARK-18238 Project: Spark Issue Type: Bug Environment: 2.0.2 snapshot Reporter: Harish Priority: Minor In spark 2.0.2/hadoop 2.7, i am getting below message. Not sure is this impacting my execution. 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30541: [rdd_511_104] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30542: [rdd_511_105] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30562: [rdd_511_127] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30571: [rdd_511_137] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30572: [rdd_511_138] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30588: [rdd_511_156] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30603: [rdd_511_171] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30600: [rdd_511_168] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30612: [rdd_511_180] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30622: [rdd_511_190] 16/11/03 01:10:23 WARN Executor: 1 block locks were not released by TID = 30629: [rdd_511_197] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620885#comment-15620885 ] Harish edited comment on SPARK-16740 at 10/31/16 12:38 PM: --- Thank you. I downloaded the 2.0.2 snapshot with 2.7 Hadoop (i think its on 10/13). I can still reproduce this issue. If the "2.0.2-rc1" was updated after 10/13 then i will take the updates and try. Can you please help me to find the latest download path.? I am going to try 2.0.3 snap shot from below location -- any suggestions? http://people.apache.org/~pwendell/spark-nightly/spark-branch-2.0-bin/latest/spark-2.0.3-SNAPSHOT-bin-hadoop2.7.tgz was (Author: harishk15): Thank you. I downloaded the 2.0.2 snapshot with 2.7 Hadoop (i think its on 10/13). I can still reproduce this issue. If the "2.0.2-rc1" was updated after 10/13 then i will take the updates and try. Can you please help me to find the latest download path.? > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at >
[jira] [Commented] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620885#comment-15620885 ] Harish commented on SPARK-16740: Thank you. I downloaded the 2.0.2 snapshot with 2.7 Hadoop (i think its on 10/13). I can still reproduce this issue. If the "2.0.2-rc1" was updated after 10/13 then i will take the updates and try. Can you please help me to find the latest download path.? > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at >
[jira] [Commented] (SPARK-16740) joins.LongToUnsafeRowMap crashes with NegativeArraySizeException
[ https://issues.apache.org/jira/browse/SPARK-16740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620711#comment-15620711 ] Harish commented on SPARK-16740: is this fix is available in 2.0.2 snapshot?. Please confirm > joins.LongToUnsafeRowMap crashes with NegativeArraySizeException > > > Key: SPARK-16740 > URL: https://issues.apache.org/jira/browse/SPARK-16740 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Sylvain Zimmer >Assignee: Sylvain Zimmer > Fix For: 2.0.1, 2.1.0 > > > Hello, > Here is a crash in Spark SQL joins, with a minimal reproducible test case. > Interestingly, it only seems to happen when reading Parquet data (I added a > {{crash = True}} variable to show it) > This is an {{left_outer}} example, but it also crashes with a regular > {{inner}} join. > {code} > import os > from pyspark import SparkContext > from pyspark.sql import types as SparkTypes > from pyspark.sql import SQLContext > sc = SparkContext() > sqlc = SQLContext(sc) > schema1 = SparkTypes.StructType([ > SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True) > ]) > schema2 = SparkTypes.StructType([ > SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True) > ]) > # Valid Long values (-9223372036854775808 < -5543241376386463808 , > 4661454128115150227 < 9223372036854775807) > data1 = [(4661454128115150227,), (-5543241376386463808,)] > data2 = [(650460285, )] > df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1) > df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2) > crash = True > if crash: > os.system("rm -rf /tmp/sparkbug") > df1.write.parquet("/tmp/sparkbug/vertex") > df2.write.parquet("/tmp/sparkbug/edge") > df1 = sqlc.read.load("/tmp/sparkbug/vertex") > df2 = sqlc.read.load("/tmp/sparkbug/edge") > result_df = df2.join(df1, on=(df1.id1 == df2.id2), how="left_outer") > # Should print [Row(id2=650460285, id1=None)] > print result_df.collect() > {code} > When ran with {{spark-submit}}, the final {{collect()}} call crashes with > this: > {code} > py4j.protocol.Py4JJavaError: An error occurred while calling > o61.collectToPython. > : org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) > at > org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) > at > org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenOuter(BroadcastHashJoinExec.scala:242) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) > at > org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) > at > org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) > at > org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(ExistingRDD.scala:225) > at > org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77) > at >
[jira] [Commented] (SPARK-16522) [MESOS] Spark application throws exception on exit
[ https://issues.apache.org/jira/browse/SPARK-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15620225#comment-15620225 ] Harish commented on SPARK-16522: I am getting same error in spark 2.0.2 snapshot. Standalone submission. py4j.protocol.Py4JJavaError: An error occurred while calling o37785.count. : org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:194) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120) at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.consume(BroadcastHashJoinExec.scala:38) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:232) at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30) at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:79) at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:194) at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153) at org.apache.spark.sql.execution.InputAdapter.consume(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.InputAdapter.doProduce(WholeStageCodegenExec.scala:244) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.InputAdapter.produce(WholeStageCodegenExec.scala:218) at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:113) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83) at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78) at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:79) at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40) at
[jira] [Created] (SPARK-17948) WARN CodeGenerator: Error calculating stats of compiled class
Harish created SPARK-17948: -- Summary: WARN CodeGenerator: Error calculating stats of compiled class Key: SPARK-17948 URL: https://issues.apache.org/jira/browse/SPARK-17948 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.1 Reporter: Harish I am getting below error in 2.0.2 snapshot. I use pyspark 16/10/14 22:33:25 WARN CodeGenerator: Error calculating stats of compiled class. java.lang.IndexOutOfBoundsException: Index: 2659, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.codehaus.janino.util.ClassFile.getConstantPoolInfo(ClassFile.java:457) at org.codehaus.janino.util.ClassFile.getConstantUtf8(ClassFile.java:469) at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1387) at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555) at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518) at org.codehaus.janino.util.ClassFile.(ClassFile.java:185) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:919) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:916) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) at scala.collection.AbstractIterable.foreach(Iterable.scala:54) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:916) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:888) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:950) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:947) at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) at org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:841) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.create(GenerateMutableProjection.scala:140) at org.apache.spark.sql.catalyst.expressions.codegen.GenerateMutableProjection$.generate(GenerateMutableProjection.scala:44) at org.apache.spark.sql.execution.SparkPlan.newMutableProjection(SparkPlan.scala:369) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4$$anonfun$5.apply(HashAggregateExec.scala:110) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4$$anonfun$5.apply(HashAggregateExec.scala:109) at org.apache.spark.sql.execution.aggregate.AggregationIterator.generateProcessRow(AggregationIterator.scala:179) at org.apache.spark.sql.execution.aggregate.AggregationIterator.(AggregationIterator.scala:198) at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.(TungstenAggregationIterator.scala:92) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:103) at org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$doExecute$1$$anonfun$4.apply(HashAggregateExec.scala:94) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at
[jira] [Resolved] (SPARK-17942) OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
[ https://issues.apache.org/jira/browse/SPARK-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish resolved SPARK-17942. Resolution: Works for Me > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > - > > Key: SPARK-17942 > URL: https://issues.apache.org/jira/browse/SPARK-17942 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1 >Reporter: Harish >Priority: Minor > > My code snipped is in below location. In that snippet i had put only few > columns, but in my test case i have data with 10M rows and 10,000 columns. > http://stackoverflow.com/questions/39602596/convert-groupbykey-to-reducebykey-pyspark > I see below message in spark 2.0.2 snapshot > # Stderr of the node > OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been > disabled. > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > # stdout of the node > CodeCache: size=245760Kb used=242680Kb max_used=242689Kb free=3079Kb > bounds [0x7f32c500, 0x7f32d400, 0x7f32d400] > total_blobs=41388 nmethods=40792 adapters=501 > compilation: disabled (not enough contiguous free space left) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17942) OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
[ https://issues.apache.org/jira/browse/SPARK-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576375#comment-15576375 ] Harish edited comment on SPARK-17942 at 10/14/16 8:20 PM: -- --conf "spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=600m" -- will work. thanks Please close this. was (Author: harishk15): --conf "spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=600m" -- will work. thanks > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > - > > Key: SPARK-17942 > URL: https://issues.apache.org/jira/browse/SPARK-17942 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1 >Reporter: Harish >Priority: Minor > > My code snipped is in below location. In that snippet i had put only few > columns, but in my test case i have data with 10M rows and 10,000 columns. > http://stackoverflow.com/questions/39602596/convert-groupbykey-to-reducebykey-pyspark > I see below message in spark 2.0.2 snapshot > # Stderr of the node > OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been > disabled. > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > # stdout of the node > CodeCache: size=245760Kb used=242680Kb max_used=242689Kb free=3079Kb > bounds [0x7f32c500, 0x7f32d400, 0x7f32d400] > total_blobs=41388 nmethods=40792 adapters=501 > compilation: disabled (not enough contiguous free space left) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17942) OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
[ https://issues.apache.org/jira/browse/SPARK-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576375#comment-15576375 ] Harish commented on SPARK-17942: --conf "spark.executor.extraJavaOptions=-XX:ReservedCodeCacheSize=600m" -- will work. thanks > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > - > > Key: SPARK-17942 > URL: https://issues.apache.org/jira/browse/SPARK-17942 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1 >Reporter: Harish >Priority: Minor > > My code snipped is in below location. In that snippet i had put only few > columns, but in my test case i have data with 10M rows and 10,000 columns. > http://stackoverflow.com/questions/39602596/convert-groupbykey-to-reducebykey-pyspark > I see below message in spark 2.0.2 snapshot > # Stderr of the node > OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been > disabled. > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > # stdout of the node > CodeCache: size=245760Kb used=242680Kb max_used=242689Kb free=3079Kb > bounds [0x7f32c500, 0x7f32d400, 0x7f32d400] > total_blobs=41388 nmethods=40792 adapters=501 > compilation: disabled (not enough contiguous free space left) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17942) OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
[ https://issues.apache.org/jira/browse/SPARK-17942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish updated SPARK-17942: --- Priority: Minor (was: Major) > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > - > > Key: SPARK-17942 > URL: https://issues.apache.org/jira/browse/SPARK-17942 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.0.1 >Reporter: Harish >Priority: Minor > > My code snipped is in below location. In that snippet i had put only few > columns, but in my test case i have data with 10M rows and 10,000 columns. > http://stackoverflow.com/questions/39602596/convert-groupbykey-to-reducebykey-pyspark > I see below message in spark 2.0.2 snapshot > # Stderr of the node > OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been > disabled. > OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using > -XX:ReservedCodeCacheSize= > # stdout of the node > CodeCache: size=245760Kb used=242680Kb max_used=242689Kb free=3079Kb > bounds [0x7f32c500, 0x7f32d400, 0x7f32d400] > total_blobs=41388 nmethods=40792 adapters=501 > compilation: disabled (not enough contiguous free space left) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17942) OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=
Harish created SPARK-17942: -- Summary: OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= Key: SPARK-17942 URL: https://issues.apache.org/jira/browse/SPARK-17942 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.1 Reporter: Harish My code snipped is in below location. In that snippet i had put only few columns, but in my test case i have data with 10M rows and 10,000 columns. http://stackoverflow.com/questions/39602596/convert-groupbykey-to-reducebykey-pyspark I see below message in spark 2.0.2 snapshot # Stderr of the node OpenJDK 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled. OpenJDK 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize= # stdout of the node CodeCache: size=245760Kb used=242680Kb max_used=242689Kb free=3079Kb bounds [0x7f32c500, 0x7f32d400, 0x7f32d400] total_blobs=41388 nmethods=40792 adapters=501 compilation: disabled (not enough contiguous free space left) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15575421#comment-15575421 ] Harish commented on SPARK-16845: I have posted a scenario in stack overflow http://stackoverflow.com/questions/40044779/find-mean-and-corr-of-10-000-columns-in-pyspark-dataframe ... let me know if you need any help on this. If this is already taken care you can ignore my comment. > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > - > > Key: SPARK-16845 > URL: https://issues.apache.org/jira/browse/SPARK-16845 > Project: Spark > Issue Type: Bug > Components: Java API, ML, MLlib >Affects Versions: 2.0.0 >Reporter: hejie > > I have a wide table(400 columns), when I try fitting the traindata on all > columns, the fatal error occurs. > ... 46 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > at org.codehaus.janino.CodeContext.write(CodeContext.java:854) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572478#comment-15572478 ] Harish edited comment on SPARK-17908 at 10/13/16 4:58 PM: -- Yes. Your code structure is same as mine.. But i have 70M records with 1000 columns. It works with simple joins as above. But when you try to modify the DF multiple times this will happen, as i was getting this error from 1.6.0 but i didn't raise because i cant prove this with working use case. But it happens frequently with my code so i tried with rename Here my steps: df = df.select('key1', 'key2', 'key3', 'val','total') -70Million records df =df.withColumn('key2', 'ABC') df1= df.groupBy('key1', 'key2', 'key3').agg(func.count(func.col('val')).alias('total')) df1 = df1.columnRenamed('key2', 'key2') df3 =df.join(df1, ['key1', 'key2', 'key3'])\ .withcolumn('newcol', func.col('val')/func.col('total')) I just wanted to see if any one else observed this behavior, I will try to find the code sample to proof this issue. If not in another 1-2 days i will mark it not reproducible. was (Author: harishk15): Yes. You are code structure is same as mine.. But i have 70M records with 1000 columns. It works with simple joins as above. But when you try to modify the DF multiple times this will happen, as i was getting this error from 1.6.0 but i didn't raise because i cant prove this with working use case. But it happens frequently with my code so i tried with rename Here my steps: df = df.select('key1', 'key2', 'key3', 'val','total') -70Million records df =df.withColumn('key2', 'ABC') df1= df.groupBy('key1', 'key2', 'key3').agg(func.count(func.col('val')).alias('total')) df1 = df1.columnRenamed('key2', 'key2') df3 =df.join(df1, ['key1', 'key2', 'key3'])\ .withcolumn('newcol', func.col('val')/func.col('total')) I just wanted to see if any one else observed this behavior, I will try to find the code sample to proof this issue. If not in another 1-2 days i will mark it not reproducible. > Column names Corrupted in pysaprk dataframe groupBy > --- > > Key: SPARK-17908 > URL: https://issues.apache.org/jira/browse/SPARK-17908 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1 >Reporter: Harish >Priority: Minor > > I have DF say df > df1= df.groupBy('key1', 'key2', > 'key3').agg(func.count(func.col('val')).alias('total')) > df3 =df.join(df1, ['key1', 'key2', 'key3'])\ > .withcolumn('newcol', func.col('val')/func.col('total')) > I am getting key2 is not present in df1, which is not truw becuase df1.show > () is having the data with the key2. > Then i added this code before join-- df1 = df1.columnRenamed('key2', 'key2') > renamed with same name. Then it works. > Stack trace will say column missing, but it is npt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572478#comment-15572478 ] Harish commented on SPARK-17908: Yes. You are code structure is same as mine.. But i have 70M records with 1000 columns. It works with simple joins as above. But when you try to modify the DF multiple times this will happen, as i was getting this error from 1.6.0 but i didn't raise because i cant prove this with working use case. But it happens frequently with my code so i tried with rename Here my steps: df = df.select('key1', 'key2', 'key3', 'val','total') -70Million records df =df.withColumn('key2', 'ABC') df1= df.groupBy('key1', 'key2', 'key3').agg(func.count(func.col('val')).alias('total')) df1 = df1.columnRenamed('key2', 'key2') df3 =df.join(df1, ['key1', 'key2', 'key3'])\ .withcolumn('newcol', func.col('val')/func.col('total')) I just wanted to see if any one else observed this behavior, I will try to find the code sample to proof this issue. If not in another 1-2 days i will mark it not reproducible. > Column names Corrupted in pysaprk dataframe groupBy > --- > > Key: SPARK-17908 > URL: https://issues.apache.org/jira/browse/SPARK-17908 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1 >Reporter: Harish >Priority: Minor > > I have DF say df > df1= df.groupBy('key1', 'key2', > 'key3').agg(func.count(func.col('val')).alias('total')) > df3 =df.join(df1, ['key1', 'key2', 'key3'])\ > .withcolumn('newcol', func.col('val')/func.col('total')) > I am getting key2 is not present in df1, which is not truw becuase df1.show > () is having the data with the key2. > Then i added this code before join-- df1 = df1.columnRenamed('key2', 'key2') > renamed with same name. Then it works. > Stack trace will say column missing, but it is npt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572373#comment-15572373 ] Harish edited comment on SPARK-17908 at 10/13/16 4:17 PM: -- Sorry.. I didnt put the actual column names of my code in stack trace, i have modified the first line of stack trace for the column names. I am confirming i can see the column name in the columns list eg: key2#20202 missing from key2#20202, key1#key2#23723, key3#20342 etc was (Author: harishk15): Sorry.. I didnt put the actual column names of my code in stack trace, i have modified the first line of stack trace for the column names. > Column names Corrupted in pysaprk dataframe groupBy > --- > > Key: SPARK-17908 > URL: https://issues.apache.org/jira/browse/SPARK-17908 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1 >Reporter: Harish >Priority: Minor > > I have DF say df > df1= df.groupBy('key1', 'key2', > 'key3').agg(func.count(func.col('val')).alias('total')) > df3 =df.join(df1, ['key1', 'key2', 'key3'])\ > .withcolumn('newcol', func.col('val')/func.col('total')) > I am getting key2 is not present in df1, which is not truw becuase df1.show > () is having the data with the key2. > Then i added this code before join-- df1 = df1.columnRenamed('key2', 'key2') > renamed with same name. Then it works. > Stack trace will say column missing, but it is npt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572373#comment-15572373 ] Harish commented on SPARK-17908: Sorry.. I didnt put the actual column names of my code in stack trace, i have modified the first line of stack trace for the column names. > Column names Corrupted in pysaprk dataframe groupBy > --- > > Key: SPARK-17908 > URL: https://issues.apache.org/jira/browse/SPARK-17908 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1 >Reporter: Harish >Priority: Minor > > I have DF say df > df1= df.groupBy('key1', 'key2', > 'key3').agg(func.count(func.col('val')).alias('total')) > df3 =df.join(df1, ['key1', 'key2', 'key3'])\ > .withcolumn('newcol', func.col('val')/func.col('total')) > I am getting key2 is not present in df1, which is not truw becuase df1.show > () is having the data with the key2. > Then i added this code before join-- df1 = df1.columnRenamed('key2', 'key2') > renamed with same name. Then it works. > Stack trace will say column missing, but it is npt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572278#comment-15572278 ] Harish edited comment on SPARK-17908 at 10/13/16 4:13 PM: -- Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: ['key1', 'key2', 'key3', 'total' and df coumns]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:191) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:201) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:205) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:205) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$5.apply(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2603) at org.apache.spark.sql.Dataset.select(Dataset.scala:969) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) was (Author: harishk15): Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: ['key1', 'key2', 'key3', 'total' and
[jira] [Comment Edited] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572278#comment-15572278 ] Harish edited comment on SPARK-17908 at 10/13/16 4:12 PM: -- Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: ['key1', 'key2', 'key3', 'total' and df1 coumns]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:191) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:201) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:205) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:205) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$5.apply(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2603) at org.apache.spark.sql.Dataset.select(Dataset.scala:969) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) was (Author: harishk15): Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: ['key1', 'key2', 'key3', 'total'];
[jira] [Comment Edited] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572278#comment-15572278 ] Harish edited comment on SPARK-17908 at 10/13/16 4:09 PM: -- Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: ['key1', 'key2', 'key3', 'total']; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:191) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:201) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:205) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:205) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$5.apply(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2603) at org.apache.spark.sql.Dataset.select(Dataset.scala:969) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) was (Author: harishk15): Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: [columns]; at
[jira] [Commented] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
[ https://issues.apache.org/jira/browse/SPARK-17908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572278#comment-15572278 ] Harish commented on SPARK-17908: Traceback (most recent call last): File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/home/hpcuser/iri/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o376.select. : org.apache.spark.sql.AnalysisException: cannot resolve '`key2`' given input columns: [columns]; at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:301) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:300) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionUp$1(QueryPlan.scala:191) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:201) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2$1.apply(QueryPlan.scala:205) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$2(QueryPlan.scala:205) at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$5.apply(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:179) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:210) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:74) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:126) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:67) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:58) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withPlan(Dataset.scala:2603) at org.apache.spark.sql.Dataset.select(Dataset.scala:969) at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:280) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:214) at java.lang.Thread.run(Thread.java:745) > Column names Corrupted in pysaprk dataframe groupBy > --- > > Key: SPARK-17908 > URL: https://issues.apache.org/jira/browse/SPARK-17908 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.6.0, 1.6.1, 1.6.2, 2.0.0, 2.0.1 >Reporter: Harish >Priority: Minor > > I have DF say df > df1= df.groupBy('key1', 'key2', > 'key3').agg(func.count(func.col('val')).alias('total')) > df3 =df.join(df1, ['key1', 'key2', 'key3'])\ >
[jira] [Created] (SPARK-17908) Column names Corrupted in pysaprk dataframe groupBy
Harish created SPARK-17908: -- Summary: Column names Corrupted in pysaprk dataframe groupBy Key: SPARK-17908 URL: https://issues.apache.org/jira/browse/SPARK-17908 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.1, 2.0.0, 1.6.2, 1.6.1, 1.6.0 Reporter: Harish Priority: Minor I have DF say df df1= df.groupBy('key1', 'key2', 'key3').agg(func.count(func.col('val')).alias('total')) df3 =df.join(df1, ['key1', 'key2', 'key3'])\ .withcolumn('newcol', func.col('val')/func.col('total')) I am getting key2 is not present in df1, which is not truw becuase df1.show () is having the data with the key2. Then i added this code before join-- df1 = df1.columnRenamed('key2', 'key2') renamed with same name. Then it works. Stack trace will say column missing, but it is npt. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-17901) NettyRpcEndpointRef: Error sending message and Caused by: java.util.ConcurrentModificationException
Harish created SPARK-17901: -- Summary: NettyRpcEndpointRef: Error sending message and Caused by: java.util.ConcurrentModificationException Key: SPARK-17901 URL: https://issues.apache.org/jira/browse/SPARK-17901 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.0.1 Reporter: Harish I have 2 data frames one with 10K rows and 10,000 columns and another with 4M rows with 50 columns. I joined this and trying to find mean of merged data set, i calculated the mean using lamda using python mean() function. I cant write in pyspark due to 64KB code limit issue. After calculating the mean i did rdd.take(2). it works.But creating the DF from RDD and DF.show is progress for more than 2 hours (I stopped the process) with below message (102 GB , 6 cores per node -- total 10 nodes+ 1master) 16/10/13 04:36:31 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(9,[Lscala.Tuple2;@68eedafb,BlockManagerId(9, 10.63.136.103, 35729))] in 1 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.ConcurrentModificationException at java.util.ArrayList.writeObject(ArrayList.java:766) at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441) at java.util.Collections$SynchronizedCollection.writeObject(Collections.java:2081) at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at
[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566515#comment-15566515 ] Harish edited comment on SPARK-17463 at 10/11/16 8:34 PM: -- My second approach was: def testfunc(keys, vals, columnsToStandardize): df= pd.DataFrame(vals, columns = keys) df[columnsToStandardize] = df[columnsToStandardize] - df[columnsToStandardize].mean() df3.rdd.map(keys).groupByKey().flatMap(lambda keyval: testfunc(keys[0], keys[1], columnsToStandardize)) was (Author: harishk15): My second approach was: def testfunc(keys, vals, columnsToStandardize): df= pd.DataFrame(vals, columns = keys) df[columnsToStandardize] = df[columnsToStandardize] - df[columnsToStandardize].mean() df3.rdd.map(keys).groupByKey().flatMap(lambda keyval: testfunc(keys[0], keys[1], columnsToStandardize)) > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at >
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566515#comment-15566515 ] Harish commented on SPARK-17463: My second approach was: def testfunc(keys, vals, columnsToStandardize): df= pd.DataFrame(vals, columns = keys) df[columnsToStandardize] = df[columnsToStandardize] - df[columnsToStandardize].mean() df3.rdd.map(keys).groupByKey().flatMap(lambda keyval: testfunc(keys[0], keys[1], columnsToStandardize)) > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at >
[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566427#comment-15566427 ] Harish edited comment on SPARK-17463 at 10/11/16 8:06 PM: -- No i dont have any code like that. I use pyspark .. Please find my code snippet df1 with 60 columns (70M records) df2 with 3000-7000 (varies) columns (10M) join df1 and df2 with key columns (please note df1 is more granular data and df2 one level above. So data set will grow df3 = df1.join(df2, [keys]) aggList = [func.mean(col).alias(col + '_m') for col in df2.columns] Last part is i do -- df4 = df3.groupBy(keys).agg(*aggList) --I applying mean to each column of the df3 data frame which might be 3000-1 columns. Let me know if you need entire stack trace of this issue. PS: We still have issue https://issues.apache.org/jira/browse/SPARK-16845 -- So i have to break number of columns 500 chunks was (Author: harishk15): No i dont have any code like that. I use pyspark .. Please find my code snippet df1 with 60 columns (70M records) df2 with 3000-7000 (varies) columns (10M) join df1 and df2 with key columns (please note df1 is more granular data and df2 one level above. So data set will grow df3 = df1.join(df2, [keys]) aggList = [func.mean(col).alias(col + '_m') for col in df2.columns] Last part is i do -- df4 = df3.groupBy(keys).agg(*aggList) --I applying mean to each column of the df3 data frame which might be 3000-1 columns. PS: We still have issue https://issues.apache.org/jira/browse/SPARK-16845 -- So i have to break number of columns 500 chunks > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at >
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566427#comment-15566427 ] Harish commented on SPARK-17463: No i dont have any code like that. I use pyspark .. Please find my code snippet df1 with 60 columns (70M records) df2 with 3000-7000 (varies) columns (10M) join df1 and df2 with key columns (please note df1 is more granular data and df2 one level above. So data set will grow df3 = df1.join(df2, [keys]) aggList = [func.mean(col).alias(col + '_m') for col in df2.columns] Last part is i do -- df4 = df3.groupBy(keys).agg(*aggList) --I applying mean to each column of the df3 data frame which might be 3000-1 columns. PS: We still have issue https://issues.apache.org/jira/browse/SPARK-16845 -- So i have to break number of columns 500 chunks > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at >
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566413#comment-15566413 ] Harish commented on SPARK-17463: Ok. thanks for the update. Do we have any work around for the second part of the issue? I tried this set("spark.rpc.netty.dispatcher.numThreads","2") but no luck > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566206#comment-15566206 ] Harish commented on SPARK-17463: Is this fix is part of the https://github.com/apache/spark/pull/15371 pull request?. I have 2.0.1 in my cluster but i am getting both the errors. 16/10/11 00:53:42 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(2,[Lscala.Tuple2;@43f45f95,BlockManagerId(2, HOST, 43256))] in 1 attempts org.apache.spark.SparkException: Exception thrown in awaitResult at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1877) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.ConcurrentModificationException at java.util.ArrayList.writeObject(ArrayList.java:766) at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.defaultWriteObject(ObjectOutputStream.java:441) at java.util.Collections$SynchronizedCollection.writeObject(Collections.java:2081) at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at
[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565493#comment-15565493 ] Harish commented on SPARK-17463: It looks like a show stopper for my current project. Can you please let me know the 2.1.0 release date or do we have same issue in 1.6.0/1/2 ? > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at
[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565493#comment-15565493 ] Harish edited comment on SPARK-17463 at 10/11/16 2:03 PM: -- It looks like a show stopper for my current project. Can you please let me know the 2.1.0 release date or do we have same issue in 1.6.0/1/2 ? So that i can revert back to 1.6. was (Author: harishk15): It looks like a show stopper for my current project. Can you please let me know the 2.1.0 release date or do we have same issue in 1.6.0/1/2 ? > Serialization of accumulators in heartbeats is not thread-safe > -- > > Key: SPARK-17463 > URL: https://issues.apache.org/jira/browse/SPARK-17463 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Josh Rosen >Assignee: Shixiong Zhu >Priority: Critical > Fix For: 2.0.1, 2.1.0 > > > Check out the following {{ConcurrentModificationException}}: > {code} > 16/09/06 16:10:29 WARN NettyRpcEndpointRef: Error sending message [message = > Heartbeat(2,[Lscala.Tuple2;@66e7b6e7,BlockManagerId(2, HOST, 57743))] in 1 > attempts > org.apache.spark.SparkException: Exception thrown in awaitResult > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75) > at > scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at > org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) > at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83) > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:518) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at > org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1862) > at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList.writeObject(ArrayList.java:766) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1378) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1174) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at > java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at >