[jira] [Commented] (SPARK-7841) Spark build should not use lib_managed for dependencies
[ https://issues.apache.org/jira/browse/SPARK-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791674#comment-14791674 ] Iulian Dragos commented on SPARK-7841: -- Yes, there are a few build scripts (including make-distribution IIRC) that depend on having things in `lib_managed`. For the moment I'm applying a patch locally, I hope to have some time to look at this in the next week or two. > Spark build should not use lib_managed for dependencies > --- > > Key: SPARK-7841 > URL: https://issues.apache.org/jira/browse/SPARK-7841 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.1 >Reporter: Iulian Dragos > Labels: easyfix, sbt > > - unnecessary duplication (I will have those libraries under ./m2, via maven > anyway) > - every time I call make-distribution I lose lib_managed (via mvn clean > install) and have to wait to download again all jars next time I use sbt > - Eclipse does not handle relative paths very well (source attachments from > lib_managed don’t always work) > - it's not the default configuration. If we stray from defaults I think there > should be a clear advantage. > Digging through history, the only reference to `retrieveManaged := true` I > found was in f686e3d, from July 2011 ("Initial work on converting build to > SBT 0.10.1"). My guess this is purely an accident of porting the build form > Sbt 0.7.x and trying to keep the old project layout. > If there are reasons for keeping it, please comment (I didn't get any answers > on the [dev mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/Why-use-quot-lib-managed-quot-for-the-Sbt-build-td12361.html]) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10051) Support collecting data of StructType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10051: Assignee: (was: Apache Spark) > Support collecting data of StructType in DataFrame > -- > > Key: SPARK-10051 > URL: https://issues.apache.org/jira/browse/SPARK-10051 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10051) Support collecting data of StructType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10051: Assignee: Apache Spark > Support collecting data of StructType in DataFrame > -- > > Key: SPARK-10051 > URL: https://issues.apache.org/jira/browse/SPARK-10051 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10051) Support collecting data of StructType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791617#comment-14791617 ] Apache Spark commented on SPARK-10051: -- User 'sun-rui' has created a pull request for this issue: https://github.com/apache/spark/pull/8794 > Support collecting data of StructType in DataFrame > -- > > Key: SPARK-10051 > URL: https://issues.apache.org/jira/browse/SPARK-10051 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791613#comment-14791613 ] Vladimir Picka commented on SPARK-10659: Here is unanswered attempt for discussion on a mailing list: https://mail.google.com/mail/#search/label%3Aspark-user+petr/14f64c75c15f5ccd > DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not > nullable) flag in schema > -- > > Key: SPARK-10659 > URL: https://issues.apache.org/jira/browse/SPARK-10659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0 >Reporter: Vladimir Picka > > DataFrames currently automatically promotes all Parquet schema fields to > optional when they are written to an empty directory. The problem remains in > v1.5.0. > The culprit is this code: > val relation = if (doInsertion) { > // This is a hack. We always set > nullable/containsNull/valueContainsNull to true > // for the schema of a parquet data. > val df = > sqlContext.createDataFrame( > data.queryExecution.toRdd, > data.schema.asNullable) > val createdRelation = > createRelation(sqlContext, parameters, > df.schema).asInstanceOf[ParquetRelation2] > createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite) > createdRelation > } > which was implemented as part of this PR: > https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b > This very unexpected behaviour for some use cases when files are read from > one place and written to another like small file packing - it ends up with > incompatible files because required can't be promoted to optional normally. > It is essence of a schema that it enforces "required" invariant on data. It > should be supposed that it is intended. > I believe that a better approach is to have default behaviour to keep schema > as is and provide f.e. a builder method or option to allow forcing to > optional. > Right now we have to overwrite private API so that our files are rewritten as > is with all its perils. > Vladimir -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Picka updated SPARK-10659: --- Summary: DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema (was: SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema) > DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not > nullable) flag in schema > -- > > Key: SPARK-10659 > URL: https://issues.apache.org/jira/browse/SPARK-10659 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0 >Reporter: Vladimir Picka > > DataFrames currently automatically promotes all Parquet schema fields to > optional when they are written to an empty directory. The problem remains in > v1.5.0. > The culprit is this code: > val relation = if (doInsertion) { > // This is a hack. We always set > nullable/containsNull/valueContainsNull to true > // for the schema of a parquet data. > val df = > sqlContext.createDataFrame( > data.queryExecution.toRdd, > data.schema.asNullable) > val createdRelation = > createRelation(sqlContext, parameters, > df.schema).asInstanceOf[ParquetRelation2] > createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite) > createdRelation > } > which was implemented as part of this PR: > https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b > This very unexpected behaviour for some use cases when files are read from > one place and written to another like small file packing - it ends up with > incompatible files because required can't be promoted to optional normally. > It is essence of a schema that it enforces "required" invariant on data. It > should be supposed that it is intended. > I believe that a better approach is to have default behaviour to keep schema > as is and provide f.e. a builder method or option to allow forcing to > optional. > Right now we have to overwrite private API so that our files are rewritten as > is with all its perils. > Vladimir -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10659) SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema
Vladimir Picka created SPARK-10659: -- Summary: SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema Key: SPARK-10659 URL: https://issues.apache.org/jira/browse/SPARK-10659 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0 Reporter: Vladimir Picka DataFrames currently automatically promotes all Parquet schema fields to optional when they are written to an empty directory. The problem remains in v1.5.0. The culprit is this code: val relation = if (doInsertion) { // This is a hack. We always set nullable/containsNull/valueContainsNull to true // for the schema of a parquet data. val df = sqlContext.createDataFrame( data.queryExecution.toRdd, data.schema.asNullable) val createdRelation = createRelation(sqlContext, parameters, df.schema).asInstanceOf[ParquetRelation2] createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite) createdRelation } which was implemented as part of this PR: https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b This very unexpected behaviour for some use cases when files are read from one place and written to another like small file packing - it ends up with incompatible files because required can't be promoted to optional normally. It is essence of a schema that it enforces "required" invariant on data. It should be supposed that it is intended. I believe that a better approach is to have default behaviour to keep schema as is and provide f.e. a builder method or option to allow forcing to optional. Right now we have to overwrite private API so that our files are rewritten as is with all its perils. Vladimir -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10658) Could pyspark provide addJars() as scala spark API?
[ https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryanchou updated SPARK-10658: - Description: My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which using addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the situation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). Have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? If someone want to addjars() in pyspark-written scripts not using '--jars'. Could you give us some suggestions on it? was: My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which using addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the situation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). Have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? > Could pyspark provide addJars() as scala spark API? > > > Key: SPARK-10658 > URL: https://issues.apache.org/jira/browse/SPARK-10658 > Project: Spark > Issue Type: Wish > Components: PySpark >Affects Versions: 1.3.1 > Environment: Linux ubuntu 14.01 LTS >Reporter: ryanchou > Labels: features > Original Estimate: 48h > Remaining Estimate: 48h > > My spark program was written by pyspark API , and it has used the spark-csv > jar library. > I could submit the task by spark-submit, and add `--jars` arguments for using > spark-csv jar library as following commands: > ``` > /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py > ``` > However I need to run my unittests like: > ``` > py.test -vvs test_xxx.py > ``` > It could't add jars by adding '--jars' arugment. > Therefore I tried to use the SparkContext.addPyFile() API t
[jira] [Updated] (SPARK-10658) Could pyspark provide addJars() as scala spark API?
[ https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ryanchou updated SPARK-10658: - Description: My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which using addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the situation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). Have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? was: My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which use addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the siuation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? > Could pyspark provide addJars() as scala spark API? > > > Key: SPARK-10658 > URL: https://issues.apache.org/jira/browse/SPARK-10658 > Project: Spark > Issue Type: Wish > Components: PySpark >Affects Versions: 1.3.1 > Environment: Linux ubuntu 14.01 LTS >Reporter: ryanchou > Labels: features > Original Estimate: 48h > Remaining Estimate: 48h > > My spark program was written by pyspark API , and it has used the spark-csv > jar library. > I could submit the task by spark-submit, and add `--jars` arguments for using > spark-csv jar library as following commands: > ``` > /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py > ``` > However I need to run my unittests like: > ``` > py.test -vvs test_xxx.py > ``` > It could't add jars by adding '--jars' arugment. > Therefore I tried to use the SparkContext.addPyFile() API to add jars in my > test_xxx.py. > Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, > .py, .j
[jira] [Created] (SPARK-10658) Could pyspark provide addJars() as scala spark API?
ryanchou created SPARK-10658: Summary: Could pyspark provide addJars() as scala spark API? Key: SPARK-10658 URL: https://issues.apache.org/jira/browse/SPARK-10658 Project: Spark Issue Type: Wish Components: PySpark Affects Versions: 1.3.1 Environment: Linux ubuntu 14.01 LTS Reporter: ryanchou My spark program was written by pyspark API , and it has used the spark-csv jar library. I could submit the task by spark-submit, and add `--jars` arguments for using spark-csv jar library as following commands: ``` /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar xxx.py ``` However I need to run my unittests like: ``` py.test -vvs test_xxx.py ``` It could't add jars by adding '--jars' arugment. Therefore I tried to use the SparkContext.addPyFile() API to add jars in my test_xxx.py. Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, .py, .jar). Does it mean that I could add *.jar (jar libraries) by using the addPyFile()? The codes which use addPyFile() to add jars as below: ``` self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar")) sqlContext = SQLContext(self.sc) self.dataframe = sqlContext.load( source="com.databricks.spark.csv", header="true", path="xxx.csv" ) ``` While it doesn't work. sqlContext cannot load the source(com.databricks.spark.csv) Eventually I have found another way to set the enviroment variable SPARK_CLASSPATH for loading jars libraries ``` SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py ``` It could load the jars libraries and sqlContext could load source succeed as well as adding `--jar xxx1.jar` arguments For the siuation on using third party jars (.py & .egg could work well by using addPyFile()) in pyspark-written scripts. and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py). have you ever planed to provide an API such as addJars() in scala for adding jars to spark program, or was there a better way to add jars I still havent found it yet? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10657: Assignee: Josh Rosen (was: Apache Spark) > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SCP-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10657: --- Description: As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to use our custom SCP-based mechanism for archiving Jenkins logs on the master machine; this has been superseded by the use of a Jenkins plugin which archives the logs and provides public viewing of them. We should remove the legacy log syncing code, since this is a blocker to disabling Worker -> Master SSH on Jenkins. was: As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to use our custom SSH-based mechanism for archiving Jenkins logs on the master machine; this has been superseded by the use of a Jenkins plugin which archives the logs and provides public viewing of them. We should remove the legacy log syncing code, since this is a blocker to disabling Worker -> Master SSH on Jenkins. > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SCP-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10657: --- Summary: Remove legacy SCP-based Jenkins log archiving code (was: Remove legacy SSH-based Jenkins log archiving code) > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SSH-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791549#comment-14791549 ] Apache Spark commented on SPARK-10657: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8793 > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SSH-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10657: Assignee: Apache Spark (was: Josh Rosen) > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Apache Spark > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SCP-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10657) Remove legacy SSH-based Jenkins log archiving code
Josh Rosen created SPARK-10657: -- Summary: Remove legacy SSH-based Jenkins log archiving code Key: SPARK-10657 URL: https://issues.apache.org/jira/browse/SPARK-10657 Project: Spark Issue Type: Improvement Components: Project Infra Reporter: Josh Rosen Assignee: Josh Rosen As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to use our custom SSH-based mechanism for archiving Jenkins logs on the master machine; this has been superseded by the use of a Jenkins plugin which archives the logs and provides public viewing of them. We should remove the legacy log syncing code, since this is a blocker to disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10606) Cube/Rollup/GrpSet doesn't create the correct plan when group by is on something other than an AttributeReference
[ https://issues.apache.org/jira/browse/SPARK-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791499#comment-14791499 ] Cheng Hao commented on SPARK-10606: --- [~rhbutani] Which version are you using, actually I've fixed the bug at SPARK-8972, it should be included in 1.5. Can you try that with 1.5? > Cube/Rollup/GrpSet doesn't create the correct plan when group by is on > something other than an AttributeReference > - > > Key: SPARK-10606 > URL: https://issues.apache.org/jira/browse/SPARK-10606 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Harish Butani >Priority: Critical > > Consider the following table: t(a : String, b : String) and the query > {code} > select a, concat(b, '1'), count(*) > from t > group by a, concat(b, '1') with cube > {code} > The projections in the Expand operator are not setup correctly. The expand > logic in Analyzer:expand is comparing grouping expressions against > child.output. So {{concat(b, '1')}} is never mapped to a null Literal. > A simple fix is to add a Rule to introduce a Projection below the > Cube/Rollup/GrpSet operator that additionally projects the > groupingExpressions that are missing in the child. > Marking this as Critical, because you get wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10656) select(df(*)) fails when a column has special characters
Nick Pritchard created SPARK-10656: -- Summary: select(df(*)) fails when a column has special characters Key: SPARK-10656 URL: https://issues.apache.org/jira/browse/SPARK-10656 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Nick Pritchard Best explained with this example: {code} val df = sqlContext.read.json(sqlContext.sparkContext.makeRDD( """{"a.b": "c", "d": "e" }""" :: Nil)) df.select("*").show() //successful df.select(df("*")).show() //throws exception df.withColumnRenamed("d", "f").show() //also fails, possibly related {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8547) xgboost exploration
[ https://issues.apache.org/jira/browse/SPARK-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791472#comment-14791472 ] Tian Jian Wang commented on SPARK-8547: --- Venkata I have started on this as a pet project before. If you have yet started, can I try? > xgboost exploration > --- > > Key: SPARK-8547 > URL: https://issues.apache.org/jira/browse/SPARK-8547 > Project: Spark > Issue Type: New Feature > Components: ML, MLlib >Reporter: Joseph K. Bradley > > There has been quite a bit of excitement around xgboost: > [https://github.com/dmlc/xgboost] > It improves the parallelism of boosting by mixing boosting and bagging (where > bagging makes the algorithm more parallel). > It would worth exploring implementing this within MLlib (probably as a new > algorithm). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791443#comment-14791443 ] Balagopal Nair commented on SPARK-10644: Standalone cluster manager. I've verified this behaviour again now. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
[ https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791440#comment-14791440 ] Suresh Thalamati commented on SPARK-10655: -- I am working on pull request for this issue. > Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT > - > > Key: SPARK-10655 > URL: https://issues.apache.org/jira/browse/SPARK-10655 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Suresh Thalamati > > Default type mapping does not work when reading from DB2 table that contains > XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791439#comment-14791439 ] Saisai Shao commented on SPARK-10644: - So what's the cluster manager you use, standalone, mesos or Yarn? There shouldn't have such problem is resource is enough as far as I know. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
Suresh Thalamati created SPARK-10655: Summary: Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT Key: SPARK-10655 URL: https://issues.apache.org/jira/browse/SPARK-10655 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.0 Reporter: Suresh Thalamati Default type mapping does not work when reading from DB2 table that contains XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791438#comment-14791438 ] Thomas Graves commented on SPARK-10640: --- yes 1.5 history server reading 1.5.0 logs. I'm not as worried about forward compatibility but it would be nice if we handled and put blank or unknown for values like this so it will at least be viewable. > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434 ] Balagopal Nair edited comment on SPARK-10644 at 9/17/15 1:51 AM: - No. These are independent jobs running under different SparkContexts. Sorry about not being clear enough before... I'm trying share the same cluster between varrious applications. This issue is related to scheduling across applications and not within the same application. was (Author: nbalagopal): No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434 ] Balagopal Nair commented on SPARK-10644: No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10654: Assignee: Apache Spark > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh >Assignee: Apache Spark > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791425#comment-14791425 ] Apache Spark commented on SPARK-10654: -- User 'rezazadeh' has created a pull request for this issue: https://github.com/apache/spark/pull/8792 > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10654: Assignee: (was: Apache Spark) > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
Reza Zadeh created SPARK-10654: -- Summary: Add columnSimilarities to IndexedRowMatrix Key: SPARK-10654 URL: https://issues.apache.org/jira/browse/SPARK-10654 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Reza Zadeh Add columnSimilarities to IndexedRowMatrix. In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791420#comment-14791420 ] Apache Spark commented on SPARK-10625: -- User 'tribbloid' has created a pull request for this issue: https://github.com/apache/spark/pull/8785 > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10625: Assignee: (was: Apache Spark) > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10625: Assignee: Apache Spark > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng >Assignee: Apache Spark > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791416#comment-14791416 ] Josh Rosen commented on SPARK-10640: This is a 1.5.0 history server reading 1.5.0 logs? In principle we also have this bug when trying to read 1.5.x logs with a 1.4.x history server. I'm going to mark this as a 1.5.1 critical bug to make sure it gets fixed there. This probably affects 1.3.x and 1.4.x, too, so I'm going to update the affected versions. > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10640: --- Affects Version/s: 1.3.0 1.4.0 Target Version/s: 1.5.1 Priority: Critical (was: Major) > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10635) pyspark - running on a different host
[ https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791389#comment-14791389 ] Josh Rosen commented on SPARK-10635: [~davies], do you think we should support this? This seems like a hard-to-support feature, so I'm inclined to say that this issue is "Won't Fix" as currently described. > pyspark - running on a different host > - > > Key: SPARK-10635 > URL: https://issues.apache.org/jira/browse/SPARK-10635 > Project: Spark > Issue Type: Improvement >Reporter: Ben Duffield > > At various points we assume we only ever talk to a driver on the same host. > e.g. > https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L615 > We use pyspark to connect to an existing driver (i.e. do not let pyspark > launch the driver itself, but instead construct the SparkContext with the > gateway and jsc arguments. > There are a few reasons for this, but essentially it's to allow more > flexibility when running in AWS. > Before 1.3.1 we were able to monkeypatch around this: > {code} > def _load_from_socket(port, serializer): > sock = socket.socket() > sock.settimeout(3) > try: > sock.connect((host, port)) > rf = sock.makefile("rb", 65536) > for item in serializer.load_stream(rf): > yield item > finally: > sock.close() > pyspark.rdd._load_from_socket = _load_from_socket > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv
[ https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791388#comment-14791388 ] Josh Rosen commented on SPARK-10653: Note that SparkEnv is technically a developer API, but all of its fields point to things which are non-developer-API. Thus I feel that there's not a compatibility concern here, but others might disagree. > Remove unnecessary things from SparkEnv > --- > > Key: SPARK-10653 > URL: https://issues.apache.org/jira/browse/SPARK-10653 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or > > As of the writing of this message, there are at least two things that can be > removed from it: > {code} > @DeveloperApi > class SparkEnv ( > val executorId: String, > private[spark] val rpcEnv: RpcEnv, > val serializer: Serializer, > val closureSerializer: Serializer, > val cacheManager: CacheManager, > val mapOutputTracker: MapOutputTracker, > val shuffleManager: ShuffleManager, > val broadcastManager: BroadcastManager, > val blockTransferService: BlockTransferService, // this one can go > val blockManager: BlockManager, > val securityManager: SecurityManager, > val httpFileServer: HttpFileServer, > val sparkFilesDir: String, // this one maybe? It's only used in 1 place. > val metricsSystem: MetricsSystem, > val shuffleMemoryManager: ShuffleMemoryManager, > val executorMemoryManager: ExecutorMemoryManager, // this can go > val outputCommitCoordinator: OutputCommitCoordinator, > val conf: SparkConf) extends Logging { > ... > } > {code} > We should avoid adding to this infinite list of things in SparkEnv's > constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Affects Version/s: 1.4.1 1.5.0 > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.4.1, 1.5.0 >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Issue Type: Bug (was: Improvement) > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.1, 1.5.0 >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791387#comment-14791387 ] Josh Rosen commented on SPARK-10647: The spark.deploy.zookeeper.* properties are used by the standalone mode's HA recovery features (https://spark.apache.org/docs/latest/spark-standalone.html#high-availability). I think the correct fix here is to update the Mesos code to use spark.deploy.mesos.zookeeper.dir (https://github.com/apache/spark/pull/5144/files#diff-3c5e5516915ada1d89f1259de069R97). We should also update the Mesos documentation to mention these configurations, since they don't appear to be documented anywhere. [~tnachen], I'm going to assign the doc updates and bugfixes to you. > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Assignee: Timothy Chen > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Summary: Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented (was: Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir) > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Component/s: Mesos > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791370#comment-14791370 ] Saisai Shao commented on SPARK-10644: - Does you jobs have dependencies? That is saying the 4th job relies on the first 3 jobs to be finished and get results. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10652: Assignee: Tathagata Das (was: Apache Spark) > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10652: Assignee: Apache Spark (was: Tathagata Das) > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Apache Spark > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791357#comment-14791357 ] Apache Spark commented on SPARK-10652: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/8791 > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception
[ https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791354#comment-14791354 ] Apache Spark commented on SPARK-10381: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8790 > Infinite loop when OutputCommitCoordination is enabled and > OutputCommitter.commitTask throws exception > -- > > Key: SPARK-10381 > URL: https://issues.apache.org/jira/browse/SPARK-10381 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.6.0, 1.5.1 > > > When speculative execution is enabled, consider a scenario where the > authorized committer of a particular output partition fails during the > OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator > is supposed to release that committer's exclusive lock on committing once > that task fails. However, due to a unit mismatch the lock will not be > released, causing Spark to go into an infinite retry loop. > This bug was masked by the fact that the OutputCommitCoordinator does not > have enough end-to-end tests (the current tests use many mocks). Other > factors contributing to this bug are the fact that we have many > similarly-named identifiers that have different semantics but the same data > types (e.g. attemptNumber and taskAttemptId, with inconsistent variable > naming which makes them difficult to distinguish). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10653) Remove unnecessary things from SparkEnv
Andrew Or created SPARK-10653: - Summary: Remove unnecessary things from SparkEnv Key: SPARK-10653 URL: https://issues.apache.org/jira/browse/SPARK-10653 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or As of the writing of this message, there are at least two things that can be removed from it: {code} @DeveloperApi class SparkEnv ( val executorId: String, private[spark] val rpcEnv: RpcEnv, val serializer: Serializer, val closureSerializer: Serializer, val cacheManager: CacheManager, val mapOutputTracker: MapOutputTracker, val shuffleManager: ShuffleManager, val broadcastManager: BroadcastManager, val blockTransferService: BlockTransferService, // this one can go val blockManager: BlockManager, val securityManager: SecurityManager, val httpFileServer: HttpFileServer, val sparkFilesDir: String, // this one maybe? It's only used in 1 place. val metricsSystem: MetricsSystem, val shuffleMemoryManager: ShuffleMemoryManager, val executorMemoryManager: ExecutorMemoryManager, // this can go val outputCommitCoordinator: OutputCommitCoordinator, val conf: SparkConf) extends Logging { ... } {code} We should avoid adding to this infinite list of things in SparkEnv's constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10652) Set good job descriptions for streaming related jobs
Tathagata Das created SPARK-10652: - Summary: Set good job descriptions for streaming related jobs Key: SPARK-10652 URL: https://issues.apache.org/jira/browse/SPARK-10652 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.5.0, 1.4.1 Reporter: Tathagata Das Assignee: Tathagata Das Job descriptions will help distinguish jobs of one batch from the other in the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-10649: -- Description: The job group, and job descriptions information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. was: The job group, job descriptions and scheduler pool information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, and job descriptions information is passed through thread > local properties, and get inherited by child threads. In case of spark > streaming, the streaming jobs inherit these properties from the thread that > called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception
[ https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791347#comment-14791347 ] Apache Spark commented on SPARK-10381: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8789 > Infinite loop when OutputCommitCoordination is enabled and > OutputCommitter.commitTask throws exception > -- > > Key: SPARK-10381 > URL: https://issues.apache.org/jira/browse/SPARK-10381 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.6.0, 1.5.1 > > > When speculative execution is enabled, consider a scenario where the > authorized committer of a particular output partition fails during the > OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator > is supposed to release that committer's exclusive lock on committing once > that task fails. However, due to a unit mismatch the lock will not be > released, causing Spark to go into an infinite retry loop. > This bug was masked by the fact that the OutputCommitCoordinator does not > have enough end-to-end tests (the current tests use many mocks). Other > factors contributing to this bug are the fact that we have many > similarly-named identifiers that have different semantics but the same data > types (e.g. attemptNumber and taskAttemptId, with inconsistent variable > naming which makes them difficult to distinguish). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10639: Assignee: Apache Spark > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Apache Spark >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791302#comment-14791302 ] Apache Spark commented on SPARK-10639: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/8788 > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10639: Assignee: (was: Apache Spark) > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Issue Type: Bug (was: Test) > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at >
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10650: -- Target Version/s: 1.6.0, 1.5.1 (was: 1.5.1) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Component/s: Tests > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.uti
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Labels: flaky-test (was: ) > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Labels: flaky-test > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: > {code} > Can't find 2 executors before 1 milliseconds elapsed > {code} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791286#comment-14791286 ] Xiangrui Meng commented on SPARK-10058: --- Changed the priority to Blocker since this failed master builds frequently. > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at
[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10650: Assignee: Apache Spark (was: Michael Armbrust) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Apache Spark >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791285#comment-14791285 ] Apache Spark commented on SPARK-10650: -- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/8787 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10650: Assignee: Michael Armbrust (was: Apache Spark) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Priority: Blocker (was: Critical) > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at >
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Description: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: {code} Can't find 2 executors before 1 milliseconds elapsed {code} . was: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: Can't find 2 executors before 1 milliseconds elapsed . > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: > {code} > Can't find 2 executors before 1 milliseconds elapsed > {code} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Description: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: Can't find 2 executors before 1 milliseconds elapsed . was: Saw many failures recently in master build. See attached CSV for a full list. > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: Can't find 2 executors before 1 > milliseconds elapsed > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Attachment: BroadcastSuiteFailures.csv > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10651) Flaky test: BroadcastSuite
Xiangrui Meng created SPARK-10651: - Summary: Flaky test: BroadcastSuite Key: SPARK-10651 URL: https://issues.apache.org/jira/browse/SPARK-10651 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 1.6.0 Reporter: Xiangrui Meng Assignee: Shixiong Zhu Priority: Blocker Saw many failures recently in master build. See attached CSV for a full list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791252#comment-14791252 ] Peng Cheng commented on SPARK-10625: A pull request has been send that contains 2 extra unit tests and a simple fix: https://github.com/apache/spark/pull/8785 Can you help me validating it and merge in 1.5.1? > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9794) ISO DateTime parser is too strict
[ https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9794. Resolution: Fixed Fix Version/s: 1.6.0 > ISO DateTime parser is too strict > - > > Key: SPARK-9794 > URL: https://issues.apache.org/jira/browse/SPARK-9794 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0 >Reporter: Alex Angelini >Assignee: Kevin Cox > Fix For: 1.6.0 > > > The DateTime parser requires 3 millisecond digits, but that is not part of > the official ISO8601 spec. > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132 > https://en.wikipedia.org/wiki/ISO_8601 > This results in the following exception when trying to parse datetime columns > {code} > java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00" > {code} > [~joshrosen] [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9794) ISO DateTime parser is too strict
[ https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9794: --- Assignee: Kevin Cox > ISO DateTime parser is too strict > - > > Key: SPARK-9794 > URL: https://issues.apache.org/jira/browse/SPARK-9794 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0 >Reporter: Alex Angelini >Assignee: Kevin Cox > Fix For: 1.6.0 > > > The DateTime parser requires 3 millisecond digits, but that is not part of > the official ISO8601 spec. > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132 > https://en.wikipedia.org/wiki/ISO_8601 > This results in the following exception when trying to parse datetime columns > {code} > java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00" > {code} > [~joshrosen] [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10623: Target Version/s: 1.6.0, 1.5.1 > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10650: - Assignee: Michael Armbrust (was: Andrew Or) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791179#comment-14791179 ] Apache Spark commented on SPARK-10623: -- User 'zhzhan' has created a pull request for this issue: https://github.com/apache/spark/pull/8783 > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10623: Assignee: Apache Spark (was: Zhan Zhang) > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Apache Spark > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10623: Assignee: Zhan Zhang (was: Apache Spark) > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791171#comment-14791171 ] Nilesh Barge commented on SPARK-3978: - I tested with the latest Spark 1.5 release... I got the source (http://www.apache.org/dyn/closer.lua/spark/spark-1.5.0/spark-1.5.0.tgz) and then build with "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package" command... and then ran my original tests... > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous
[ https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10645: -- Component/s: SQL ML > Bivariate Statistics for continuous vs. continuous > -- > > Key: SPARK-10645 > URL: https://issues.apache.org/jira/browse/SPARK-10645 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Jihong MA > > this is an umbrella jira, which covers Bivariate Statistics for continuous > vs. continuous columns, including covariance, Pearson's correlation, > Spearman's correlation (for both continuous & categorical). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791150#comment-14791150 ] Alex Rovner commented on SPARK-3978: [~barge.nilesh] What version of Spark have you tested with? > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791141#comment-14791141 ] Nilesh Barge commented on SPARK-3978: - Thanks for resolving this, I also verified on my end and now it is working fine > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10626: Assignee: Apache Spark > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Assignee: Apache Spark >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10626: Assignee: (was: Apache Spark) > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791128#comment-14791128 ] Apache Spark commented on SPARK-10626: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/8782 > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Target Version/s: 1.5.1 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Priority: Critical (was: Major) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Description: In 1.5.0 there are some extra classes in the Spark docs - including a bunch of test classes. We need to figure out what commit introduced those and fix it. The obvious things like genJavadoc version have not changed. http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before] http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after] > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Affects Version/s: 1.5.0 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes
Patrick Wendell created SPARK-10650: --- Summary: Spark docs include test and other extra classes Key: SPARK-10650 URL: https://issues.apache.org/jira/browse/SPARK-10650 Project: Spark Issue Type: Bug Components: Documentation Reporter: Patrick Wendell Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10649: Assignee: Apache Spark (was: Tathagata Das) > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Apache Spark > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10649: Assignee: Tathagata Das (was: Apache Spark) > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791109#comment-14791109 ] Apache Spark commented on SPARK-10649: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/8781 > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical
[ https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10646: -- Component/s: SQL ML > Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. > categorical > > > Key: SPARK-10646 > URL: https://issues.apache.org/jira/browse/SPARK-10646 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Jihong MA > > Pearson's chi-squared goodness of fit test for observed against the expected > distribution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
Tathagata Das created SPARK-10649: - Summary: Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread Key: SPARK-10649 URL: https://issues.apache.org/jira/browse/SPARK-10649 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.5.0, 1.4.1, 1.3.1 Reporter: Tathagata Das Assignee: Tathagata Das The job group, job descriptions and scheduler pool information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10648: Assignee: (was: Apache Spark) > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791081#comment-14791081 ] Apache Spark commented on SPARK-10648: -- User 'travishegner' has created a pull request for this issue: https://github.com/apache/spark/pull/8780 > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10648: Assignee: Apache Spark > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner >Assignee: Apache Spark > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
Travis Hegner created SPARK-10648: - Summary: Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema. Key: SPARK-10648 URL: https://issues.apache.org/jira/browse/SPARK-10648 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Environment: using oracle 11g, ojdbc7.jar Reporter: Travis Hegner Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a scala app, I am getting an exception "Overflowed precision". Some times I would get the exception "Unscaled value too large for precision". This issue likely affects older versions as well, but this was the version I verified it on. I narrowed it down to the fact that the schema detection system was trying to set the precision to 0, and the scale to -127. I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6504) Cannot read Parquet files generated from different versions at once
[ https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-6504. - Resolution: Fixed Fix Version/s: 1.3.1 This should be fixed. Please reopen if you are still having problems. > Cannot read Parquet files generated from different versions at once > --- > > Key: SPARK-6504 > URL: https://issues.apache.org/jira/browse/SPARK-6504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1 >Reporter: Marius Soutier > Fix For: 1.3.1 > > > When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the > same time via > `sqlContext.parquetFile("fileFrom1.1.parqut,fileFrom1.2.parquet")` an > exception occurs: > could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has > conflicting values: > [{"type":"struct","fields":[{"name":"date","type":"string","nullable":true,"metadata":{}},{"name":"account","type":"string","nullable":true,"metadata":{}},{"name":"impressions","type":"long","nullable":false,"metadata":{}},{"name":"cost","type":"double","nullable":false,"metadata":{}},{"name":"clicks","type":"long","nullable":false,"metadata":{}},{"name":"conversions","type":"long","nullable":false,"metadata":{}},{"name":"orderValue","type":"double","nullable":false,"metadata":{}}]}, > StructType(List(StructField(date,StringType,true), > StructField(account,StringType,true), > StructField(impressions,LongType,false), StructField(cost,DoubleType,false), > StructField(clicks,LongType,false), StructField(conversions,LongType,false), > StructField(orderValue,DoubleType,false)))] > The Schema is exactly equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6086. --- Resolution: Cannot Reproduce Resolving as "cannot reproduce" for now, pending updates. > Exceptions in DAGScheduler.updateAccumulators > - > > Key: SPARK-6086 > URL: https://issues.apache.org/jira/browse/SPARK-6086 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, SQL >Affects Versions: 1.3.0 >Reporter: Kai Zeng >Priority: Critical > > Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler > is collecting status from tasks. These exceptions happen occasionally, > especially when there are many stages in a job. > Application code: > https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala > Script used: ./bin/spark-submit --class > org.apache.spark.examples.sql.hive.SQLSuite > examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar > benchmark-cache 6 > There are two types of error messages: > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to > scala.collection.TraversableOnce > at > org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10050) Support collecting data of MapType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-10050. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8711 [https://github.com/apache/spark/pull/8711] > Support collecting data of MapType in DataFrame > --- > > Key: SPARK-10050 > URL: https://issues.apache.org/jira/browse/SPARK-10050 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org