date:20150916

[jira] [Commented] (SPARK-7841) Spark build should not use lib_managed for dependencies

2015-09-16 Thread Iulian Dragos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791674#comment-14791674
 ] 

Iulian Dragos commented on SPARK-7841:
--

Yes, there are a few build scripts (including make-distribution IIRC) that 
depend on having things in `lib_managed`. For the moment I'm applying a patch 
locally, I hope to have some time to look at this in the next week or two.

> Spark build should not use lib_managed for dependencies
> ---
>
> Key: SPARK-7841
> URL: https://issues.apache.org/jira/browse/SPARK-7841
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Iulian Dragos
>  Labels: easyfix, sbt
>
> - unnecessary duplication (I will have those libraries under ./m2, via maven 
> anyway)
> - every time I call make-distribution I lose lib_managed (via mvn clean 
> install) and have to wait to download again all jars next time I use sbt
> - Eclipse does not handle relative paths very well (source attachments from 
> lib_managed don’t always work)
> - it's not the default configuration. If we stray from defaults I think there 
> should be a clear advantage.
> Digging through history, the only reference to `retrieveManaged := true` I 
> found was in f686e3d, from July 2011 ("Initial work on converting build to 
> SBT 0.10.1"). My guess this is purely an accident of porting the build form 
> Sbt 0.7.x and trying to keep the old project layout.
> If there are reasons for keeping it, please comment (I didn't get any answers 
> on the [dev mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/Why-use-quot-lib-managed-quot-for-the-Sbt-build-td12361.html])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10051) Support collecting data of StructType in DataFrame

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10051:


Assignee: (was: Apache Spark)

> Support collecting data of StructType in DataFrame
> --
>
> Key: SPARK-10051
> URL: https://issues.apache.org/jira/browse/SPARK-10051
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10051) Support collecting data of StructType in DataFrame

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10051:


Assignee: Apache Spark

> Support collecting data of StructType in DataFrame
> --
>
> Key: SPARK-10051
> URL: https://issues.apache.org/jira/browse/SPARK-10051
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10051) Support collecting data of StructType in DataFrame

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791617#comment-14791617
 ] 

Apache Spark commented on SPARK-10051:
--

User 'sun-rui' has created a pull request for this issue:
https://github.com/apache/spark/pull/8794

> Support collecting data of StructType in DataFrame
> --
>
> Key: SPARK-10051
> URL: https://issues.apache.org/jira/browse/SPARK-10051
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791613#comment-14791613
 ] 

Vladimir Picka commented on SPARK-10659:


Here is unanswered attempt for discussion on a mailing list:
https://mail.google.com/mail/#search/label%3Aspark-user+petr/14f64c75c15f5ccd

> DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not 
> nullable) flag in schema
> --
>
> Key: SPARK-10659
> URL: https://issues.apache.org/jira/browse/SPARK-10659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0
>Reporter: Vladimir Picka
>
> DataFrames currently automatically promotes all Parquet schema fields to 
> optional when they are written to an empty directory. The problem remains in 
> v1.5.0.
> The culprit is this code:
> val relation = if (doInsertion) {
>   // This is a hack. We always set
> nullable/containsNull/valueContainsNull to true
>   // for the schema of a parquet data.
>   val df =
> sqlContext.createDataFrame(
>   data.queryExecution.toRdd,
>   data.schema.asNullable)
>   val createdRelation =
> createRelation(sqlContext, parameters,
> df.schema).asInstanceOf[ParquetRelation2]
>   createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
>   createdRelation
> }
> which was implemented as part of this PR:
> https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b
> This very unexpected behaviour for some use cases when files are read from 
> one place and written to another like small file packing - it ends up with 
> incompatible files because required can't be promoted to optional normally. 
> It is essence of a schema that it enforces "required" invariant on data. It 
> should be supposed that it is intended.
> I believe that a better approach is to have default behaviour to keep schema 
> as is and provide f.e. a builder method or option to allow forcing to 
> optional.
> Right now we have to overwrite private API so that our files are rewritten as 
> is with all its perils.
> Vladimir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Picka updated SPARK-10659:
---
Summary: DataFrames and SparkSQL saveAsParquetFile does not preserve 
REQUIRED (not nullable) flag in schema  (was: SparkSQL saveAsParquetFile does 
not preserve REQUIRED (not nullable) flag in schema)

> DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not 
> nullable) flag in schema
> --
>
> Key: SPARK-10659
> URL: https://issues.apache.org/jira/browse/SPARK-10659
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0, 1.4.1, 1.5.0
>Reporter: Vladimir Picka
>
> DataFrames currently automatically promotes all Parquet schema fields to 
> optional when they are written to an empty directory. The problem remains in 
> v1.5.0.
> The culprit is this code:
> val relation = if (doInsertion) {
>   // This is a hack. We always set
> nullable/containsNull/valueContainsNull to true
>   // for the schema of a parquet data.
>   val df =
> sqlContext.createDataFrame(
>   data.queryExecution.toRdd,
>   data.schema.asNullable)
>   val createdRelation =
> createRelation(sqlContext, parameters,
> df.schema).asInstanceOf[ParquetRelation2]
>   createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
>   createdRelation
> }
> which was implemented as part of this PR:
> https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b
> This very unexpected behaviour for some use cases when files are read from 
> one place and written to another like small file packing - it ends up with 
> incompatible files because required can't be promoted to optional normally. 
> It is essence of a schema that it enforces "required" invariant on data. It 
> should be supposed that it is intended.
> I believe that a better approach is to have default behaviour to keep schema 
> as is and provide f.e. a builder method or option to allow forcing to 
> optional.
> Right now we have to overwrite private API so that our files are rewritten as 
> is with all its perils.
> Vladimir



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10659) SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2015-09-16 Thread Vladimir Picka (JIRA)

Vladimir Picka created SPARK-10659:
--

 Summary: SparkSQL saveAsParquetFile does not preserve REQUIRED 
(not nullable) flag in schema
 Key: SPARK-10659
 URL: https://issues.apache.org/jira/browse/SPARK-10659
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0, 1.4.1, 1.4.0, 1.3.1, 1.3.0
Reporter: Vladimir Picka


DataFrames currently automatically promotes all Parquet schema fields to 
optional when they are written to an empty directory. The problem remains in 
v1.5.0.

The culprit is this code:
val relation = if (doInsertion) {
  // This is a hack. We always set
nullable/containsNull/valueContainsNull to true
  // for the schema of a parquet data.
  val df =
sqlContext.createDataFrame(
  data.queryExecution.toRdd,
  data.schema.asNullable)
  val createdRelation =
createRelation(sqlContext, parameters,
df.schema).asInstanceOf[ParquetRelation2]
  createdRelation.insert(df, overwrite = mode == SaveMode.Overwrite)
  createdRelation
}

which was implemented as part of this PR:
https://github.com/apache/spark/commit/1b490e91fd6b5d06d9caeb50e597639ccfc0bc3b

This very unexpected behaviour for some use cases when files are read from one 
place and written to another like small file packing - it ends up with 
incompatible files because required can't be promoted to optional normally. It 
is essence of a schema that it enforces "required" invariant on data. It should 
be supposed that it is intended.

I believe that a better approach is to have default behaviour to keep schema as 
is and provide f.e. a builder method or option to allow forcing to optional.

Right now we have to overwrite private API so that our files are rewritten as 
is with all its perils.

Vladimir




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10658) Could pyspark provide addJars() as scala spark API?

2015-09-16 Thread ryanchou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryanchou updated SPARK-10658:
-
Description: 
My spark program was written by pyspark API , and it has used the spark-csv jar 
library. 

I could submit the task by spark-submit, and add `--jars` arguments for using 
spark-csv jar library as following commands:

```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
```

However I need to run my unittests like:

```
py.test -vvs test_xxx.py
```

It could't add jars by adding '--jars' arugment.

Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
test_xxx.py. 

Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
.py, .jar). 

Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?

The codes which using addPyFile() to add jars as below: 

```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)

```

While it doesn't work. sqlContext cannot load the 
source(com.databricks.spark.csv)

Eventually I have found another way to set the enviroment variable 
SPARK_CLASSPATH for loading jars libraries

```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```

It could load the jars libraries and sqlContext could load source succeed as 
well as adding `--jar xxx1.jar` arguments

For the situation on using third party jars (.py & .egg could work well by 
using addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).

Have you ever planed to provide an API such as addJars() in scala for adding 
jars to spark program, or was there a better way to add jars I still havent 
found it yet?

If someone want to addjars() in pyspark-written scripts not using '--jars'. 

Could you give us some suggestions on it?


  was:
My spark program was written by pyspark API , and it has used the spark-csv jar 
library. 

I could submit the task by spark-submit, and add `--jars` arguments for using 
spark-csv jar library as following commands:

```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
```

However I need to run my unittests like:

```
py.test -vvs test_xxx.py
```

It could't add jars by adding '--jars' arugment.

Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
test_xxx.py. 

Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
.py, .jar). 

Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?

The codes which using addPyFile() to add jars as below: 

```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)

```

While it doesn't work. sqlContext cannot load the 
source(com.databricks.spark.csv)

Eventually I have found another way to set the enviroment variable 
SPARK_CLASSPATH for loading jars libraries

```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```

It could load the jars libraries and sqlContext could load source succeed as 
well as adding `--jar xxx1.jar` arguments

For the situation on using third party jars (.py & .egg could work well by 
using addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).

Have you ever planed to provide an API such as addJars() in scala for adding 
jars to spark program, or was there a better way to add jars I still havent 
found it yet?




> Could pyspark provide addJars() as scala spark API? 
> 
>
> Key: SPARK-10658
> URL: https://issues.apache.org/jira/browse/SPARK-10658
> Project: Spark
>  Issue Type: Wish
>  Components: PySpark
>Affects Versions: 1.3.1
> Environment: Linux ubuntu 14.01 LTS
>Reporter: ryanchou
>  Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> My spark program was written by pyspark API , and it has used the spark-csv 
> jar library. 
> I could submit the task by spark-submit, and add `--jars` arguments for using 
> spark-csv jar library as following commands:
> ```
> /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
> ```
> However I need to run my unittests like:
> ```
> py.test -vvs test_xxx.py
> ```
> It could't add jars by adding '--jars' arugment.
> Therefore I tried to use the SparkContext.addPyFile() API t

[jira] [Updated] (SPARK-10658) Could pyspark provide addJars() as scala spark API?

2015-09-16 Thread ryanchou (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryanchou updated SPARK-10658:
-
Description: 
My spark program was written by pyspark API , and it has used the spark-csv jar 
library. 

I could submit the task by spark-submit, and add `--jars` arguments for using 
spark-csv jar library as following commands:

```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
```

However I need to run my unittests like:

```
py.test -vvs test_xxx.py
```

It could't add jars by adding '--jars' arugment.

Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
test_xxx.py. 

Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
.py, .jar). 

Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?

The codes which using addPyFile() to add jars as below: 

```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)

```

While it doesn't work. sqlContext cannot load the 
source(com.databricks.spark.csv)

Eventually I have found another way to set the enviroment variable 
SPARK_CLASSPATH for loading jars libraries

```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```

It could load the jars libraries and sqlContext could load source succeed as 
well as adding `--jar xxx1.jar` arguments

For the situation on using third party jars (.py & .egg could work well by 
using addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).

Have you ever planed to provide an API such as addJars() in scala for adding 
jars to spark program, or was there a better way to add jars I still havent 
found it yet?



  was:
My spark program was written by pyspark API , and it has used the spark-csv jar 
library. 

I could submit the task by spark-submit, and add `--jars` arguments for using 
spark-csv jar library as following commands:

```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
```

However I need to run my unittests like:

```
py.test -vvs test_xxx.py
```

It could't add jars by adding '--jars' arugment.

Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
test_xxx.py. 

Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
.py, .jar). 

Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?

The codes which use addPyFile() to add jars as below: 

```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)

```

While it doesn't work. sqlContext cannot load the 
source(com.databricks.spark.csv)

Eventually I have found another way to set the enviroment variable 
SPARK_CLASSPATH for loading jars libraries

```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```

It could load the jars libraries and sqlContext could load source succeed as 
well as adding `--jar xxx1.jar` arguments


For the siuation on using third party jars (.py & .egg could work well by using 
addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).

have you ever planed to provide an API such as addJars() in scala for adding 
jars to spark program, or was there a better way to add jars I still havent 
found it yet?




> Could pyspark provide addJars() as scala spark API? 
> 
>
> Key: SPARK-10658
> URL: https://issues.apache.org/jira/browse/SPARK-10658
> Project: Spark
>  Issue Type: Wish
>  Components: PySpark
>Affects Versions: 1.3.1
> Environment: Linux ubuntu 14.01 LTS
>Reporter: ryanchou
>  Labels: features
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> My spark program was written by pyspark API , and it has used the spark-csv 
> jar library. 
> I could submit the task by spark-submit, and add `--jars` arguments for using 
> spark-csv jar library as following commands:
> ```
> /bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
> ```
> However I need to run my unittests like:
> ```
> py.test -vvs test_xxx.py
> ```
> It could't add jars by adding '--jars' arugment.
> Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
> test_xxx.py. 
> Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
> .py, .j

[jira] [Created] (SPARK-10658) Could pyspark provide addJars() as scala spark API?

2015-09-16 Thread ryanchou (JIRA)

ryanchou created SPARK-10658:


 Summary: Could pyspark provide addJars() as scala spark API? 
 Key: SPARK-10658
 URL: https://issues.apache.org/jira/browse/SPARK-10658
 Project: Spark
  Issue Type: Wish
  Components: PySpark
Affects Versions: 1.3.1
 Environment: Linux ubuntu 14.01 LTS
Reporter: ryanchou


My spark program was written by pyspark API , and it has used the spark-csv jar 
library. 

I could submit the task by spark-submit, and add `--jars` arguments for using 
spark-csv jar library as following commands:

```
/bin/spark-submit --jars /path/spark-csv_2.10-1.1.0.jar  xxx.py
```

However I need to run my unittests like:

```
py.test -vvs test_xxx.py
```

It could't add jars by adding '--jars' arugment.

Therefore I tried to use the SparkContext.addPyFile() API to add jars in my 
test_xxx.py. 

Because I saw the addPyFile()'s doc mention me PACKAGES_EXTENSIONS = (.zip, 
.py, .jar). 

Does it mean that I could add *.jar (jar libraries) by using the addPyFile()?

The codes which use addPyFile() to add jars as below: 

```
self.sc.addPyFile(join(lib_path, "spark-csv_2.10-1.1.0.jar"))
sqlContext = SQLContext(self.sc)
self.dataframe = sqlContext.load(
source="com.databricks.spark.csv",
header="true",
path="xxx.csv"
)

```

While it doesn't work. sqlContext cannot load the 
source(com.databricks.spark.csv)

Eventually I have found another way to set the enviroment variable 
SPARK_CLASSPATH for loading jars libraries

```
SPARK_CLASSPATH="/path/xxx.jar:/path/xxx2.jar" py.test -vvs test_xxx.py
```

It could load the jars libraries and sqlContext could load source succeed as 
well as adding `--jar xxx1.jar` arguments


For the siuation on using third party jars (.py & .egg could work well by using 
addPyFile()) in pyspark-written scripts.
and it cannot use `--jars` on the situation (py.test -vvs test_xxx.py).

have you ever planed to provide an API such as addJars() in scala for adding 
jars to spark program, or was there a better way to add jars I still havent 
found it yet?





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10657:


Assignee: Josh Rosen  (was: Apache Spark)

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SCP-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10657:
---
Description: 
As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
use our custom SCP-based mechanism for archiving Jenkins logs on the master 
machine; this has been superseded by the use of a Jenkins plugin which archives 
the logs and provides public viewing of them.

We should remove the legacy log syncing code, since this is a blocker to 
disabling Worker -> Master SSH on Jenkins.

  was:
As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
use our custom SSH-based mechanism for archiving Jenkins logs on the master 
machine; this has been superseded by the use of a Jenkins plugin which archives 
the logs and provides public viewing of them.

We should remove the legacy log syncing code, since this is a blocker to 
disabling Worker -> Master SSH on Jenkins.


> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SCP-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10657:
---
Summary: Remove legacy SCP-based Jenkins log archiving code  (was: Remove 
legacy SSH-based Jenkins log archiving code)

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SSH-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791549#comment-14791549
 ] 

Apache Spark commented on SPARK-10657:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8793

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SSH-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10657:


Assignee: Apache Spark  (was: Josh Rosen)

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SCP-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10657) Remove legacy SSH-based Jenkins log archiving code

2015-09-16 Thread Josh Rosen (JIRA)

Josh Rosen created SPARK-10657:
--

 Summary: Remove legacy SSH-based Jenkins log archiving code
 Key: SPARK-10657
 URL: https://issues.apache.org/jira/browse/SPARK-10657
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Reporter: Josh Rosen
Assignee: Josh Rosen


As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
use our custom SSH-based mechanism for archiving Jenkins logs on the master 
machine; this has been superseded by the use of a Jenkins plugin which archives 
the logs and provides public viewing of them.

We should remove the legacy log syncing code, since this is a blocker to 
disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10606) Cube/Rollup/GrpSet doesn't create the correct plan when group by is on something other than an AttributeReference

2015-09-16 Thread Cheng Hao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791499#comment-14791499
 ] 

Cheng Hao commented on SPARK-10606:
---

[~rhbutani] Which version are you using, actually I've fixed the bug at 
SPARK-8972, it should be included in 1.5. Can you try that with 1.5?

> Cube/Rollup/GrpSet doesn't create the correct plan when group by is on 
> something other than an AttributeReference
> -
>
> Key: SPARK-10606
> URL: https://issues.apache.org/jira/browse/SPARK-10606
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Harish Butani
>Priority: Critical
>
> Consider the following table: t(a : String, b : String) and the query
> {code}
> select a, concat(b, '1'), count(*)
> from t
> group by a, concat(b, '1') with cube
> {code}
> The projections in the Expand operator are not setup correctly. The expand 
> logic in Analyzer:expand is comparing grouping expressions against 
> child.output. So {{concat(b, '1')}} is never mapped to a null Literal.  
> A simple fix is to add a Rule to introduce a Projection below the 
> Cube/Rollup/GrpSet operator that additionally projects the   
> groupingExpressions that are missing in the child.
> Marking this as Critical, because you get wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10656) select(df(*)) fails when a column has special characters

2015-09-16 Thread Nick Pritchard (JIRA)

Nick Pritchard created SPARK-10656:
--

 Summary: select(df(*)) fails when a column has special characters
 Key: SPARK-10656
 URL: https://issues.apache.org/jira/browse/SPARK-10656
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Nick Pritchard


Best explained with this example:
{code}
val df = sqlContext.read.json(sqlContext.sparkContext.makeRDD(
  """{"a.b": "c", "d": "e" }""" :: Nil))
df.select("*").show() //successful
df.select(df("*")).show() //throws exception
df.withColumnRenamed("d", "f").show() //also fails, possibly related
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8547) xgboost exploration

2015-09-16 Thread Tian Jian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791472#comment-14791472
 ] 

Tian Jian Wang commented on SPARK-8547:
---

Venkata  I have started on this as a pet project before. If you have yet 
started, can I try?

> xgboost exploration
> ---
>
> Key: SPARK-8547
> URL: https://issues.apache.org/jira/browse/SPARK-8547
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
>
> There has been quite a bit of excitement around xgboost: 
> [https://github.com/dmlc/xgboost]
> It improves the parallelism of boosting by mixing boosting and bagging (where 
> bagging makes the algorithm more parallel).
> It would worth exploring implementing this within MLlib (probably as a new 
> algorithm).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791443#comment-14791443
 ] 

Balagopal Nair commented on SPARK-10644:


Standalone cluster manager. I've verified this behaviour again now. 

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791440#comment-14791440
 ] 

Suresh Thalamati commented on SPARK-10655:
--

I am working on pull request for this issue.

> Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
> -
>
> Key: SPARK-10655
> URL: https://issues.apache.org/jira/browse/SPARK-10655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>
> Default type mapping does not work when reading from DB2 table that contains  
> XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791439#comment-14791439
 ] 

Saisai Shao commented on SPARK-10644:
-

So what's the cluster manager you use, standalone, mesos or Yarn? There 
shouldn't have such problem is resource is enough as far as I know.

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)

Suresh Thalamati created SPARK-10655:


 Summary: Enhance DB2 dialect to handle XML, and DECIMAL , and 
DECFLOAT
 Key: SPARK-10655
 URL: https://issues.apache.org/jira/browse/SPARK-10655
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Suresh Thalamati


Default type mapping does not work when reading from DB2 table that contains  
XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791438#comment-14791438
 ] 

Thomas Graves commented on SPARK-10640:
---

yes 1.5 history server reading 1.5.0 logs.  I'm not as worried about forward 
compatibility but it would be nice if we handled and put blank or unknown for 
values like this so it will at least be viewable. 

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434
 ] 

Balagopal Nair edited comment on SPARK-10644 at 9/17/15 1:51 AM:
-

No. These are independent jobs running under different SparkContexts.
Sorry about not being clear enough before... I'm trying share the same cluster 
between varrious applications. This issue is related to scheduling across 
applications and not within the same application.


was (Author: nbalagopal):
No. These are independent jobs running under different SparkContexts

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791434#comment-14791434
 ] 

Balagopal Nair commented on SPARK-10644:


No. These are independent jobs running under different SparkContexts

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10654:


Assignee: Apache Spark

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>Assignee: Apache Spark
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791425#comment-14791425
 ] 

Apache Spark commented on SPARK-10654:
--

User 'rezazadeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/8792

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10654:


Assignee: (was: Apache Spark)

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Reza Zadeh (JIRA)

Reza Zadeh created SPARK-10654:
--

 Summary: Add columnSimilarities to IndexedRowMatrix
 Key: SPARK-10654
 URL: https://issues.apache.org/jira/browse/SPARK-10654
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh


Add columnSimilarities to IndexedRowMatrix.

In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791420#comment-14791420
 ] 

Apache Spark commented on SPARK-10625:
--

User 'tribbloid' has created a pull request for this issue:
https://github.com/apache/spark/pull/8785

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10625:


Assignee: (was: Apache Spark)

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10625:


Assignee: Apache Spark

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>Assignee: Apache Spark
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791416#comment-14791416
 ] 

Josh Rosen commented on SPARK-10640:


This is a 1.5.0 history server reading 1.5.0 logs? In principle we also have 
this bug when trying to read 1.5.x logs with a 1.4.x history server. I'm going 
to mark this as a 1.5.1 critical bug to make sure it gets fixed there. This 
probably affects 1.3.x and 1.4.x, too, so I'm going to update the affected 
versions.

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10640:
---
Affects Version/s: 1.3.0
   1.4.0
 Target Version/s: 1.5.1
 Priority: Critical  (was: Major)

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10635) pyspark - running on a different host

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791389#comment-14791389
 ] 

Josh Rosen commented on SPARK-10635:


[~davies], do you think we should support this? This seems like a 
hard-to-support feature, so I'm inclined to say that this issue is "Won't Fix" 
as currently described.

> pyspark - running on a different host
> -
>
> Key: SPARK-10635
> URL: https://issues.apache.org/jira/browse/SPARK-10635
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ben Duffield
>
> At various points we assume we only ever talk to a driver on the same host.
> e.g. 
> https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L615
> We use pyspark to connect to an existing driver (i.e. do not let pyspark 
> launch the driver itself, but instead construct the SparkContext with the 
> gateway and jsc arguments.
> There are a few reasons for this, but essentially it's to allow more 
> flexibility when running in AWS.
> Before 1.3.1 we were able to monkeypatch around this:  
> {code}
> def _load_from_socket(port, serializer):
> sock = socket.socket()
> sock.settimeout(3)
> try:
> sock.connect((host, port))
> rf = sock.makefile("rb", 65536)
> for item in serializer.load_stream(rf):
> yield item
> finally:
> sock.close()
> pyspark.rdd._load_from_socket = _load_from_socket
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791388#comment-14791388
 ] 

Josh Rosen commented on SPARK-10653:


Note that SparkEnv is technically a developer API, but all of its fields point 
to things which are non-developer-API. Thus I feel that there's not a 
compatibility concern here, but others might disagree.

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Affects Version/s: 1.4.1
   1.5.0

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Issue Type: Bug  (was: Improvement)

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791387#comment-14791387
 ] 

Josh Rosen commented on SPARK-10647:


The spark.deploy.zookeeper.* properties are used by the standalone mode's HA 
recovery features 
(https://spark.apache.org/docs/latest/spark-standalone.html#high-availability). 
I think the correct fix here is to update the Mesos code to use 
spark.deploy.mesos.zookeeper.dir 
(https://github.com/apache/spark/pull/5144/files#diff-3c5e5516915ada1d89f1259de069R97).
 We should also update the Mesos documentation to mention these configurations, 
since they don't appear to be documented anywhere.


[~tnachen], I'm going to assign the doc updates and bugfixes to you.

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Assignee: Timothy Chen

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Summary: Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs 
should be documented  (was: Rename property spark.deploy.zookeeper.dir to 
spark.mesos.deploy.zookeeper.dir)

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Component/s: Mesos

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791370#comment-14791370
 ] 

Saisai Shao commented on SPARK-10644:
-

Does you jobs have dependencies? That is saying the 4th job relies on the first 
3 jobs to be finished and get results.

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10652:


Assignee: Tathagata Das  (was: Apache Spark)

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10652:


Assignee: Apache Spark  (was: Tathagata Das)

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791357#comment-14791357
 ] 

Apache Spark commented on SPARK-10652:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/8791

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791354#comment-14791354
 ] 

Apache Spark commented on SPARK-10381:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8790

> Infinite loop when OutputCommitCoordination is enabled and 
> OutputCommitter.commitTask throws exception
> --
>
> Key: SPARK-10381
> URL: https://issues.apache.org/jira/browse/SPARK-10381
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.6.0, 1.5.1
>
>
> When speculative execution is enabled, consider a scenario where the 
> authorized committer of a particular output partition fails during the 
> OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator 
> is supposed to release that committer's exclusive lock on committing once 
> that task fails. However, due to a unit mismatch the lock will not be 
> released, causing Spark to go into an infinite retry loop.
> This bug was masked by the fact that the OutputCommitCoordinator does not 
> have enough end-to-end tests (the current tests use many mocks). Other 
> factors contributing to this bug are the fact that we have many 
> similarly-named identifiers that have different semantics but the same data 
> types (e.g. attemptNumber and taskAttemptId, with inconsistent variable 
> naming which makes them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10653) Remove unnecessary things from SparkEnv

2015-09-16 Thread Andrew Or (JIRA)

Andrew Or created SPARK-10653:
-

 Summary: Remove unnecessary things from SparkEnv
 Key: SPARK-10653
 URL: https://issues.apache.org/jira/browse/SPARK-10653
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Andrew Or


As of the writing of this message, there are at least two things that can be 
removed from it:
{code}
@DeveloperApi
class SparkEnv (
val executorId: String,
private[spark] val rpcEnv: RpcEnv,
val serializer: Serializer,
val closureSerializer: Serializer,
val cacheManager: CacheManager,
val mapOutputTracker: MapOutputTracker,
val shuffleManager: ShuffleManager,
val broadcastManager: BroadcastManager,
val blockTransferService: BlockTransferService, // this one can go
val blockManager: BlockManager,
val securityManager: SecurityManager,
val httpFileServer: HttpFileServer,
val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
val metricsSystem: MetricsSystem,
val shuffleMemoryManager: ShuffleMemoryManager,
val executorMemoryManager: ExecutorMemoryManager, // this can go
val outputCommitCoordinator: OutputCommitCoordinator,
val conf: SparkConf) extends Logging {
  ...
}
{code}
We should avoid adding to this infinite list of things in SparkEnv's 
constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-10652:
-

 Summary: Set good job descriptions for streaming related jobs
 Key: SPARK-10652
 URL: https://issues.apache.org/jira/browse/SPARK-10652
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.5.0, 1.4.1
Reporter: Tathagata Das
Assignee: Tathagata Das


Job descriptions will help distinguish jobs of one batch from the other in the 
Jobs and Stages pages in the Spark UI




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-10649:
--
Description: 
The job group, and job descriptions information is passed through thread local 
properties, and get inherited by child threads. In case of spark streaming, the 
streaming jobs inherit these properties from the thread that called 
streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.


  was:
The job group, job descriptions and scheduler pool information is passed 
through thread local properties, and get inherited by child threads. In case of 
spark streaming, the streaming jobs inherit these properties from the thread 
that called streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.



> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, and job descriptions information is passed through thread 
> local properties, and get inherited by child threads. In case of spark 
> streaming, the streaming jobs inherit these properties from the thread that 
> called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791347#comment-14791347
 ] 

Apache Spark commented on SPARK-10381:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8789

> Infinite loop when OutputCommitCoordination is enabled and 
> OutputCommitter.commitTask throws exception
> --
>
> Key: SPARK-10381
> URL: https://issues.apache.org/jira/browse/SPARK-10381
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.6.0, 1.5.1
>
>
> When speculative execution is enabled, consider a scenario where the 
> authorized committer of a particular output partition fails during the 
> OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator 
> is supposed to release that committer's exclusive lock on committing once 
> that task fails. However, due to a unit mismatch the lock will not be 
> released, causing Spark to go into an infinite retry loop.
> This bug was masked by the fact that the OutputCommitCoordinator does not 
> have enough end-to-end tests (the current tests use many mocks). Other 
> factors contributing to this bug are the fact that we have many 
> similarly-named identifiers that have different semantics but the same data 
> types (e.g. attemptNumber and taskAttemptId, with inconsistent variable 
> naming which makes them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10639:


Assignee: Apache Spark

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Apache Spark
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791302#comment-14791302
 ] 

Apache Spark commented on SPARK-10639:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8788

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10639:


Assignee: (was: Apache Spark)

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Issue Type: Bug  (was: Test)

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
>

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10650:
--
Target Version/s: 1.6.0, 1.5.1  (was: 1.5.1)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Component/s: Tests

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.uti

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Labels: flaky-test  (was: )

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
>  Labels: flaky-test
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are:
> {code}
> Can't find 2 executors before 1 milliseconds elapsed
> {code}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791286#comment-14791286
 ] 

Xiangrui Meng commented on SPARK-10058:
---

Changed the priority to Blocker since this failed master builds frequently.

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at

[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10650:


Assignee: Apache Spark  (was: Michael Armbrust)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Apache Spark
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791285#comment-14791285
 ] 

Apache Spark commented on SPARK-10650:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/8787

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10650:


Assignee: Michael Armbrust  (was: Apache Spark)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Priority: Blocker  (was: Critical)

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
>

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Description: 
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are:

{code}
Can't find 2 executors before 1 milliseconds elapsed
{code}
.



  was:
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are: Can't find 2 executors before 1 
milliseconds elapsed
.




> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are:
> {code}
> Can't find 2 executors before 1 milliseconds elapsed
> {code}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Description: 
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are: Can't find 2 executors before 1 
milliseconds elapsed
.



  was:
Saw many failures recently in master build. See attached CSV for a full list.




> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are: Can't find 2 executors before 1 
> milliseconds elapsed
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Attachment: BroadcastSuiteFailures.csv

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-10651:
-

 Summary: Flaky test: BroadcastSuite
 Key: SPARK-10651
 URL: https://issues.apache.org/jira/browse/SPARK-10651
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 1.6.0
Reporter: Xiangrui Meng
Assignee: Shixiong Zhu
Priority: Blocker


Saw many failures recently in master build. See attached CSV for a full list.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Peng Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791252#comment-14791252
 ] 

Peng Cheng commented on SPARK-10625:


A pull request has been send that contains 2 extra unit tests and a simple fix:
https://github.com/apache/spark/pull/8785

Can you help me validating it and merge in 1.5.1?

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9794) ISO DateTime parser is too strict

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9794.

   Resolution: Fixed
Fix Version/s: 1.6.0

> ISO DateTime parser is too strict
> -
>
> Key: SPARK-9794
> URL: https://issues.apache.org/jira/browse/SPARK-9794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0
>Reporter: Alex Angelini
>Assignee: Kevin Cox
> Fix For: 1.6.0
>
>
> The DateTime parser requires 3 millisecond digits, but that is not part of 
> the official ISO8601 spec.
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132
> https://en.wikipedia.org/wiki/ISO_8601
> This results in the following exception when trying to parse datetime columns
> {code}
> java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00"
> {code}
> [~joshrosen] [~rxin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9794) ISO DateTime parser is too strict

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9794:
---
Assignee: Kevin Cox

> ISO DateTime parser is too strict
> -
>
> Key: SPARK-9794
> URL: https://issues.apache.org/jira/browse/SPARK-9794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0
>Reporter: Alex Angelini
>Assignee: Kevin Cox
> Fix For: 1.6.0
>
>
> The DateTime parser requires 3 millisecond digits, but that is not part of 
> the official ISO8601 spec.
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132
> https://en.wikipedia.org/wiki/ISO_8601
> This results in the following exception when trying to parse datetime columns
> {code}
> java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00"
> {code}
> [~joshrosen] [~rxin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-10623:

Target Version/s: 1.6.0, 1.5.1

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-10650:
-
Assignee: Michael Armbrust  (was: Andrew Or)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791179#comment-14791179
 ] 

Apache Spark commented on SPARK-10623:
--

User 'zhzhan' has created a pull request for this issue:
https://github.com/apache/spark/pull/8783

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10623:


Assignee: Apache Spark  (was: Zhan Zhang)

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Apache Spark
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10623:


Assignee: Zhan Zhang  (was: Apache Spark)

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Nilesh Barge (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791171#comment-14791171
 ] 

Nilesh Barge commented on SPARK-3978:
-

I tested with the latest Spark 1.5 release... 
I got the source 
(http://www.apache.org/dyn/closer.lua/spark/spark-1.5.0/spark-1.5.0.tgz) and 
then build with "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
-Phive-thriftserver -DskipTests clean package" command... and then ran my 
original tests...


> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10645:
--
Component/s: SQL
 ML

> Bivariate Statistics for continuous vs. continuous
> --
>
> Key: SPARK-10645
> URL: https://issues.apache.org/jira/browse/SPARK-10645
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Jihong MA
>
> this is an umbrella jira, which covers Bivariate Statistics for continuous 
> vs. continuous columns, including covariance, Pearson's correlation, 
> Spearman's correlation (for both continuous & categorical).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Alex Rovner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791150#comment-14791150
 ] 

Alex Rovner commented on SPARK-3978:


[~barge.nilesh] What version of Spark have you tested with?

> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Nilesh Barge (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791141#comment-14791141
 ] 

Nilesh Barge commented on SPARK-3978:
-

Thanks for resolving this, I also verified on my end and now it is working 
fine


> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10626:


Assignee: Apache Spark

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10626:


Assignee: (was: Apache Spark)

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791128#comment-14791128
 ] 

Apache Spark commented on SPARK-10626:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/8782

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Target Version/s: 1.5.1

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Priority: Critical  (was: Major)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Description: 
In 1.5.0 there are some extra classes in the Spark docs - including a bunch of 
test classes. We need to figure out what commit introduced those and fix it. 
The obvious things like genJavadoc version have not changed.

http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before]
http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after]


> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Affects Version/s: 1.5.0

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-10650:
---

 Summary: Spark docs include test and other extra classes
 Key: SPARK-10650
 URL: https://issues.apache.org/jira/browse/SPARK-10650
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10649:


Assignee: Apache Spark  (was: Tathagata Das)

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10649:


Assignee: Tathagata Das  (was: Apache Spark)

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791109#comment-14791109
 ] 

Apache Spark commented on SPARK-10649:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/8781

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10646:
--
Component/s: SQL
 ML

> Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. 
> categorical
> 
>
> Key: SPARK-10646
> URL: https://issues.apache.org/jira/browse/SPARK-10646
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Jihong MA
>
> Pearson's chi-squared goodness of fit test for observed against the expected 
> distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-10649:
-

 Summary: Streaming jobs unexpectedly inherits job group, job 
descriptions from context starting thread
 Key: SPARK-10649
 URL: https://issues.apache.org/jira/browse/SPARK-10649
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.5.0, 1.4.1, 1.3.1
Reporter: Tathagata Das
Assignee: Tathagata Das


The job group, job descriptions and scheduler pool information is passed 
through thread local properties, and get inherited by child threads. In case of 
spark streaming, the streaming jobs inherit these properties from the thread 
that called streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10648:


Assignee: (was: Apache Spark)

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791081#comment-14791081
 ] 

Apache Spark commented on SPARK-10648:
--

User 'travishegner' has created a pull request for this issue:
https://github.com/apache/spark/pull/8780

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10648:


Assignee: Apache Spark

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>Assignee: Apache Spark
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Travis Hegner (JIRA)

Travis Hegner created SPARK-10648:
-

 Summary: Spark-SQL JDBC fails to set a default precision and scale 
when they are not defined in an oracle schema.
 Key: SPARK-10648
 URL: https://issues.apache.org/jira/browse/SPARK-10648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
 Environment: using oracle 11g, ojdbc7.jar
Reporter: Travis Hegner


Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
scala app, I am getting an exception "Overflowed precision". Some times I would 
get the exception "Unscaled value too large for precision".

This issue likely affects older versions as well, but this was the version I 
verified it on.

I narrowed it down to the fact that the schema detection system was trying to 
set the precision to 0, and the scale to -127.

I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6504) Cannot read Parquet files generated from different versions at once

2015-09-16 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-6504.
-
   Resolution: Fixed
Fix Version/s: 1.3.1

This should be fixed.  Please reopen if you are still having problems.

> Cannot read Parquet files generated from different versions at once
> ---
>
> Key: SPARK-6504
> URL: https://issues.apache.org/jira/browse/SPARK-6504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Marius Soutier
> Fix For: 1.3.1
>
>
> When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the 
> same time via 
> `sqlContext.parquetFile("fileFrom1.1.parqut,fileFrom1.2.parquet")` an 
> exception occurs:
> could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has 
> conflicting values: 
> [{"type":"struct","fields":[{"name":"date","type":"string","nullable":true,"metadata":{}},{"name":"account","type":"string","nullable":true,"metadata":{}},{"name":"impressions","type":"long","nullable":false,"metadata":{}},{"name":"cost","type":"double","nullable":false,"metadata":{}},{"name":"clicks","type":"long","nullable":false,"metadata":{}},{"name":"conversions","type":"long","nullable":false,"metadata":{}},{"name":"orderValue","type":"double","nullable":false,"metadata":{}}]},
>  StructType(List(StructField(date,StringType,true), 
> StructField(account,StringType,true), 
> StructField(impressions,LongType,false), StructField(cost,DoubleType,false), 
> StructField(clicks,LongType,false), StructField(conversions,LongType,false), 
> StructField(orderValue,DoubleType,false)))]
> The Schema is exactly equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6086.
---
Resolution: Cannot Reproduce

Resolving as "cannot reproduce" for now, pending updates.

> Exceptions in DAGScheduler.updateAccumulators
> -
>
> Key: SPARK-6086
> URL: https://issues.apache.org/jira/browse/SPARK-6086
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, SQL
>Affects Versions: 1.3.0
>Reporter: Kai Zeng
>Priority: Critical
>
> Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler 
> is collecting status from tasks. These exceptions happen occasionally, 
> especially when there are many stages in a job.
> Application code: 
> https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
> Script used: ./bin/spark-submit --class 
> org.apache.spark.examples.sql.hive.SQLSuite 
> examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
> benchmark-cache 6
> There are two types of error messages:
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to 
> scala.collection.TraversableOnce
>   at 
> org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at 
> org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10050) Support collecting data of MapType in DataFrame

2015-09-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10050.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8711
[https://github.com/apache/spark/pull/8711]

> Support collecting data of MapType in DataFrame
> ---
>
> Key: SPARK-10050
> URL: https://issues.apache.org/jira/browse/SPARK-10050
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 223 matches

Mail list logo