date:20151213

[jira] [Updated] (SPARK-12317) Support configurate value with unit(e.g. kb/mb/gb) in SQL

2015-12-13 Thread Yadong Qi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yadong Qi updated SPARK-12317:
--
Summary: Support configurate value with unit(e.g. kb/mb/gb) in SQL  (was: 
Value should be configurated with unit(e.g. kb/mb/gb) in SQL)

> Support configurate value with unit(e.g. kb/mb/gb) in SQL
> -
>
> Key: SPARK-12317
> URL: https://issues.apache.org/jira/browse/SPARK-12317
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Yadong Qi
>
> e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` 
> instead of `10485760`, because `10MB` is more easier than `10485760`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12317) Value should be configurated with unit(e.g. kb/mb/gb) in SQL

2015-12-13 Thread Yadong Qi (JIRA)

Yadong Qi created SPARK-12317:
-

 Summary: Value should be configurated with unit(e.g. kb/mb/gb) in 
SQL
 Key: SPARK-12317
 URL: https://issues.apache.org/jira/browse/SPARK-12317
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.2
Reporter: Yadong Qi


e.g. `spark.sql.autoBroadcastJoinThreshold` should be configurated as `10MB` 
instead of `10485760`, because `10MB` is more easier than `10485760`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12311:


Assignee: Apache Spark

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12311:


Assignee: (was: Apache Spark)

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055578#comment-15055578
 ] 

Apache Spark commented on SPARK-12311:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/10289

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-13 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12275:
-
Target Version/s: 1.5.3

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
> Fix For: 1.6.1, 2.0.0
>
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-13 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12275:
-
Fix Version/s: 2.0.0
   1.6.1

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
> Fix For: 1.6.1, 2.0.0
>
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2015-12-13 Thread SaintBacchus (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SaintBacchus updated SPARK-12316:
-
Description: 
When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find 
the right token file and then it will endless cycle call the method 
'updateCredentialsIfRequired'.
Then it lead to StackOverflowError.

  was:
When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find 
the right token file and then it will endless cycle call the method 
'updateCredentialsIfRequired'


> Stack overflow with endless call of `Delegation token thread` when 
> application end.
> ---
>
> Key: SPARK-12316
> URL: https://issues.apache.org/jira/browse/SPARK-12316
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: SaintBacchus
>
> When application end, AM will clean the staging dir.
> But if the driver trigger to update the delegation token, it will can't find 
> the right token file and then it will endless cycle call the method 
> 'updateCredentialsIfRequired'.
> Then it lead to StackOverflowError.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12275) No plan for BroadcastHint in some condition

2015-12-13 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-12275:
-
Assignee: yucai

> No plan for BroadcastHint in some condition
> ---
>
> Key: SPARK-12275
> URL: https://issues.apache.org/jira/browse/SPARK-12275
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: yucai
>Assignee: yucai
>
> *Summary*
> No plan for BroadcastHint is generated in some condition.
> *Test Case*
> {code}
> val df1 = Seq((1, "1"), (2, "2")).toDF("key", "value")
> val parquetTempFile =
>   "%s/SPARK-_%d.parquet".format(System.getProperty("java.io.tmpdir"), 
> scala.util.Random.nextInt)
> df1.write.parquet(parquetTempFile)
> val pf1 = sqlContext.read.parquet(parquetTempFile)
> #1. df1.join(broadcast(pf1)).count()
> #2. broadcast(pf1).count()
> {code}
> *Result*
> It will trigger assertion in QueryPlanner.scala, like below:
> {code}
> scala> df1.join(broadcast(pf1)).count()
> java.lang.AssertionError: assertion failed: No plan for BroadcastHint
> +- Relation[key#6,value#7] 
> ParquetRelation[hdfs://10.1.0.20:8020/tmp/SPARK-_1817830406.parquet]
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12316) Stack overflow with endless call of `Delegation token thread` when application end.

2015-12-13 Thread SaintBacchus (JIRA)

SaintBacchus created SPARK-12316:


 Summary: Stack overflow with endless call of `Delegation token 
thread` when application end.
 Key: SPARK-12316
 URL: https://issues.apache.org/jira/browse/SPARK-12316
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.6.0
Reporter: SaintBacchus


When application end, AM will clean the staging dir.
But if the driver trigger to update the delegation token, it will can't find 
the right token file and then it will endless cycle call the method 
'updateCredentialsIfRequired'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12213) Query with only one distinct should not having on expand

2015-12-13 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-12213.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 10228
[https://github.com/apache/spark/pull/10228]

> Query with only one distinct should not having on expand
> 
>
> Key: SPARK-12213
> URL: https://issues.apache.org/jira/browse/SPARK-12213
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Davies Liu
> Fix For: 2.0.0
>
>
> Expand will double the number of records, slow down projection and 
> aggregation, it's better to generate a plan without Expand for a query with 
> only one distinct (for example, ss_max in TPCDS)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12176) SparkLauncher's setConf() does not support configs containing spaces

2015-12-13 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055508#comment-15055508
 ] 

Saisai Shao commented on SPARK-12176:
-

It is OK in my local test against latest master branch, seems no such issue. 
Probably this issue only lies in the old version of Spark.

> SparkLauncher's setConf() does not support configs containing spaces
> 
>
> Key: SPARK-12176
> URL: https://issues.apache.org/jira/browse/SPARK-12176
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
> Environment: All
>Reporter: Yuhang Chen
>Priority: Minor
>
> The spark-submit uses '--conf K=V' pattern for setting configs. According to 
> the docs, if the 'V' you set has spaces in it, the whole 'K=V' parts should 
> be wrapped with quotes. 
> However, the SparkLauncher (org.apache.spark.launcher.SparkLauncher) would 
> not do that wrapping for you, and there is no chance for wrapping by yourself 
> with the API it provides.
> For example, I want to add {{-XX:+PrintGCDetails -XX:+PrintGCTimeStamps}} for 
> executors (spark.executor.extraJavaOptions), and the conf contains a space in 
> it. 
> For spark-submit, I should wrap the conf with quotes like this:
> {code}
> --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps"
> {code}
> But when I use the setConf() API of SparkLauncher, I write code like this:
> {code}
> launcher.setConf("spark.executor.extraJavaOptions", "-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps");
> {code} 
> Now, SparkLauncher uses Java's ProcessBuilder to start a sub-process, in 
> which the spark-submit is finally executed. And it turns out that the final 
> command is like this;
> {code} 
> --conf spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps
> {code} 
> See? the quotes are gone, and the job counld not be launched with this 
> command. 
> Then I checked up the source, all confs are stored in a Map before generating 
> launching commands. Thus. my advice is checking all values of the conf Map 
> and do wrapping during command building.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12281) Fixed potential exceptions when exiting a local cluster.

2015-12-13 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-12281.
--
   Resolution: Fixed
 Assignee: Shixiong Zhu  (was: Apache Spark)
Fix Version/s: 2.0.0
   1.6.1

> Fixed potential exceptions when exiting a local cluster.
> 
>
> Key: SPARK-12281
> URL: https://issues.apache.org/jira/browse/SPARK-12281
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 1.6.1, 2.0.0
>
>
> Fixed the following potential exceptions when exiting a local cluster.
> {code}
> java.lang.AssertionError: assertion failed: executor 4 state transfer from 
> RUNNING to RUNNING is illegal
>   at scala.Predef$.assert(Predef.scala:179)
>   at 
> org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> {code}
> java.lang.IllegalStateException: Shutdown hooks cannot be modified during 
> shutdown.
>   at 
> org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
>   at 
> org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
>   at 
> org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
>   at 
> org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
>   at 
> org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
>   at 
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
>   at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
>   at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
>   at 
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12180) DataFrame.join() in PySpark gives misleading exception when column name exists on both side

2015-12-13 Thread Jeff Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055429#comment-15055429
 ] 

Jeff Zhang commented on SPARK-12180:


Could you paste your code ? It works fine for me to join 2 dataframes with 
common fields
{code}
In [12]: df1.join(df2, df1.name==df2.name)   // both has column "id" except the 
join key "name"
Out[12]: DataFrame[id: bigint, name: string, id: bigint, name: bigint]
{code}

> DataFrame.join() in PySpark gives misleading exception when column name 
> exists on both side
> ---
>
> Key: SPARK-12180
> URL: https://issues.apache.org/jira/browse/SPARK-12180
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.2
>Reporter: Daniel Thomas
>
> When joining two DataFrames on a column 'session_uuid' I got the following 
> exception, because both DataFrames hat a column called 'at'. The exception is 
> misleading in the cause and in the column causing the problem. Renaming the 
> column fixed the exception.
> ---
> Py4JJavaError Traceback (most recent call last)
> /Applications/spark-1.5.2-bin-hadoop2.4/python/pyspark/sql/utils.py in 
> deco(*a, **kw)
>  35 try:
> ---> 36 return f(*a, **kw)
>  37 except py4j.protocol.Py4JJavaError as e:
> /Applications/spark-1.5.2-bin-hadoop2.4/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py
>  in get_return_value(answer, gateway_client, target_id, name)
> 299 'An error occurred while calling {0}{1}{2}.\n'.
> --> 300 format(target_id, '.', name), value)
> 301 else:
> Py4JJavaError: An error occurred while calling o484.join.
> : org.apache.spark.sql.AnalysisException: resolved attribute(s) 
> session_uuid#3278 missing from 
> uuid_x#9078,total_session_sec#9115L,at#3248,session_uuid#9114,uuid#9117,at#9084
>  in operator !Join Inner, Some((uuid_x#9078 = session_uuid#3278));
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:103)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:49)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:914)
>   at org.apache.spark.sql.DataFrame.(DataFrame.scala:132)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$logicalPlanToDataFrame(DataFrame.scala:154)
>   at org.apache.spark.sql.DataFrame.join(DataFrame.scala:553)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> During handling of the above exception, another exception occurred:
> AnalysisException Traceback (most recent call last)
>  in ()
>   1 sel_starts = starts.select('uuid', 'at').withColumnRenamed('uuid', 
> 'uuid_x')#.withColumnRenamed('at', 'at_x')
>   2 sel_closes = closes.select('uuid', 'at', 'session_uuid', 
> 'total_session_sec')
> > 3 start_close = sel_starts.join(sel_closes, sel_starts['uuid_x'] == 
> sel_closes['session_uuid'])
>   4 start_close.cache()
>   5 start_close.take(1)
> /Applications/spark-1.5.2-bin-hadoop2.4/python/pyspark/sql/dataframe.py in 
> join(self, other, on, how)
> 579 on = on[0]
> 580 if how is None:
> --> 581 jdf = self._jdf.join(other._jdf, on._jc, "inner")
> 582 else:
> 583 assert isinst

[jira] [Commented] (SPARK-12057) Prevent failure on corrupt JSON records

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055414#comment-15055414
 ] 

Apache Spark commented on SPARK-12057:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/10288

> Prevent failure on corrupt JSON records
> ---
>
> Key: SPARK-12057
> URL: https://issues.apache.org/jira/browse/SPARK-12057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Ian Macalinao
>Priority: Minor
>
> Return failed record when a record cannot be parsed. Allows parsing of files 
> containing corrupt records of any form. Currently a corrupt record throws an 
> exception, causing the entire job to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055358#comment-15055358
 ] 

Apache Spark commented on SPARK-12315:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/10287

> isnotnull operator not pushed down for JDBC datasource.
> ---
>
> Key: SPARK-12315
> URL: https://issues.apache.org/jira/browse/SPARK-12315
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNotNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and 
> SQL:201x and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12315:


Assignee: Apache Spark

> isnotnull operator not pushed down for JDBC datasource.
> ---
>
> Key: SPARK-12315
> URL: https://issues.apache.org/jira/browse/SPARK-12315
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>
> {{IsNotNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and 
> SQL:201x and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12315:


Assignee: (was: Apache Spark)

> isnotnull operator not pushed down for JDBC datasource.
> ---
>
> Key: SPARK-12315
> URL: https://issues.apache.org/jira/browse/SPARK-12315
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNotNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and 
> SQL:201x and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12314) isnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055356#comment-15055356
 ] 

Apache Spark commented on SPARK-12314:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/10286

> isnull operator not pushed down for JDBC datasource.
> 
>
> Key: SPARK-12314
> URL: https://issues.apache.org/jira/browse/SPARK-12314
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x 
> and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12314) isnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12314:


Assignee: Apache Spark

> isnull operator not pushed down for JDBC datasource.
> 
>
> Key: SPARK-12314
> URL: https://issues.apache.org/jira/browse/SPARK-12314
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>
> {{IsNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x 
> and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12314) isnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12314:


Assignee: (was: Apache Spark)

> isnull operator not pushed down for JDBC datasource.
> 
>
> Key: SPARK-12314
> URL: https://issues.apache.org/jira/browse/SPARK-12314
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x 
> and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055347#comment-15055347
 ] 

Hyukjin Kwon commented on SPARK-12315:
--

I will work on this.

> isnotnull operator not pushed down for JDBC datasource.
> ---
>
> Key: SPARK-12315
> URL: https://issues.apache.org/jira/browse/SPARK-12315
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNotNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and 
> SQL:201x and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12314) isnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055344#comment-15055344
 ] 

Hyukjin Kwon commented on SPARK-12314:
--

I will work on this.

> isnull operator not pushed down for JDBC datasource.
> 
>
> Key: SPARK-12314
> URL: https://issues.apache.org/jira/browse/SPARK-12314
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Hyukjin Kwon
>
> {{IsNull}} filter is not being pushed down for JDBC datasource.
> It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x 
> and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12315) isnotnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-12315:


 Summary: isnotnull operator not pushed down for JDBC datasource.
 Key: SPARK-12315
 URL: https://issues.apache.org/jira/browse/SPARK-12315
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Hyukjin Kwon


{{IsNotNull}} filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003 and 
SQL:201x and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12314) isnull operator not pushed down for JDBC datasource.

2015-12-13 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-12314:


 Summary: isnull operator not pushed down for JDBC datasource.
 Key: SPARK-12314
 URL: https://issues.apache.org/jira/browse/SPARK-12314
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Hyukjin Kwon


{{IsNull}} filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to SQL-92, SQL:1999, SQL:2003, SQL:201x 
and I believe most databases support this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055256#comment-15055256
 ] 

Apache Spark commented on SPARK-12288:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/10285

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12288:


Assignee: (was: Apache Spark)

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12288:


Assignee: Apache Spark

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-12288:

Comment: was deleted

(was: [~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def outputsUnsafeRows: Boolean = children.forall(_.outputsUnsafeRows)
override def canProcessUnsafeRows: Boolean = true
override def canProcessSafeRows: Boolean = true
{code}

Is my understanding correct?

Thanks!)

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055207#comment-15055207
 ] 

Apache Spark commented on SPARK-12062:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/10284

> Master rebuilding historical SparkUI should be asynchronous
> ---
>
> Key: SPARK-12062
> URL: https://issues.apache.org/jira/browse/SPARK-12062
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Bryan Cutler
>
> When a long-running application finishes, it takes a while (sometimes 
> minutes) to rebuild the SparkUI. However, in Master.scala this is currently 
> done within the RPC event loop, which runs only in 1 thread. Thus, in the 
> mean time no other applications can register with this master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12062) Master rebuilding historical SparkUI should be asynchronous

2015-12-13 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055205#comment-15055205
 ] 

Bryan Cutler commented on SPARK-12062:
--

I read the past conversations discussing this in the related JIRAs and agree 
that would be a better approach to eventually remove this functionality from 
the master.  I'll go ahead and post this PR I have ready and maybe it will be 
of some use in the meantime before SPARK-12299.

> Master rebuilding historical SparkUI should be asynchronous
> ---
>
> Key: SPARK-12062
> URL: https://issues.apache.org/jira/browse/SPARK-12062
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Bryan Cutler
>
> When a long-running application finishes, it takes a while (sometimes 
> minutes) to rebuild the SparkUI. However, in Master.scala this is currently 
> done within the RPC event loop, which runs only in 1 thread. Thus, in the 
> mean time no other applications can register with this master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns

2015-12-13 Thread Gobinathan SP (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gobinathan SP updated SPARK-12313:
--
Description: 
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and p2= 'p2V' ")
The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .
On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalbly  it doesn't come 
through this route

I'm using Oracle for Metstore

  was:
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and p2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalbly  it doesn't come 
through this route


> getPartitionsByFilter doesnt handle predicates on all / multiple Partition 
> Columns
> --
>
> Key: SPARK-12313
> URL: https://issues.apache.org/jira/browse/SPARK-12313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Gobinathan SP
>Priority: Minor
>
> When enabled spark.sql.hive.metastorePartitionPruning, the 
> getPartitionsByFilter is used
> For a table partitioned by p1 and p2, when triggered hc.sql("select col 
> from tabl1 where p1='p1V' and p2= 'p2V' ")
> The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' 
> and col2= 'p2V' .
> On these cases the result is not returned. The number of returned rows is 
> always zero. 
> However, filter on a single column always works. Probalbly  it doesn't come 
> through this route
> I'm using Oracle for Metstore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns

2015-12-13 Thread Gobinathan SP (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gobinathan SP updated SPARK-12313:
--
Description: 
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and p2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalbly  it doesn't come 
through this route

  was:
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and p2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalble  it doesn't come 
through this route


> getPartitionsByFilter doesnt handle predicates on all / multiple Partition 
> Columns
> --
>
> Key: SPARK-12313
> URL: https://issues.apache.org/jira/browse/SPARK-12313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Gobinathan SP
>Priority: Minor
>
> When enabled spark.sql.hive.metastorePartitionPruning, the 
> getPartitionsByFilter is used
> For a table partitioned by p1 and p2, when triggered hc.sql("select col 
> from tabl1 where p1='p1V' and p2= 'p2V' ")
> The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' 
> and col2= 'p2V' .
> On these cases the result is not returned. The number of returned rows is 
> always zero. 
> However, filter on a single column always works. Probalbly  it doesn't come 
> through this route



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns

2015-12-13 Thread Gobinathan SP (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gobinathan SP updated SPARK-12313:
--
Description: 
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and p2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalble  it doesn't come 
through this route

  was:
When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and col2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalble  it doesn't come 
through this route


> getPartitionsByFilter doesnt handle predicates on all / multiple Partition 
> Columns
> --
>
> Key: SPARK-12313
> URL: https://issues.apache.org/jira/browse/SPARK-12313
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Gobinathan SP
>Priority: Minor
>
> When enabled spark.sql.hive.metastorePartitionPruning, the 
> getPartitionsByFilter is used
> For a table partitioned by p1 and p2, when triggered hc.sql("select col 
> from tabl1 where p1='p1V' and p2= 'p2V' ")
> The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' 
> and col2= 'p2V' .
> On these cases the result is not returned. The number of returned rows is 
> always zero. 
> However, filter on a single column always works. Probalble  it doesn't come 
> through this route



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns

2015-12-13 Thread Gobinathan SP (JIRA)

Gobinathan SP created SPARK-12313:
-

 Summary: getPartitionsByFilter doesnt handle predicates on all / 
multiple Partition Columns
 Key: SPARK-12313
 URL: https://issues.apache.org/jira/browse/SPARK-12313
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
Reporter: Gobinathan SP
Priority: Minor


When enabled spark.sql.hive.metastorePartitionPruning, the 
getPartitionsByFilter is used

For a table partitioned by p1 and p2, when triggered hc.sql("select col 
from tabl1 where p1='p1V' and col2= 'p2V' ")

The HiveShim identifies the Predicates and ConvertFilters returns p1='p1V' and 
col2= 'p2V' .

On these cases the result is not returned. The number of returned rows is 
always zero. 

However, filter on a single column always works. Probalble  it doesn't come 
through this route



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

2015-12-13 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055198#comment-15055198
 ] 

Yin Huai commented on SPARK-11410:
--

oh, i see. This is the table partitioning mechanism. If you use partitionBy 
before writing this table, we will understand this table is partitioned by 
column {{column}} and can skip unnecessary partitions when scan the table. 

The jira is actually for another feature, which lets users to control how to 
shuffle data by using the hash value of given columns.

> Add a DataFrame API that provides functionality similar to HiveQL's 
> DISTRIBUTE BY
> -
>
> Key: SPARK-11410
> URL: https://issues.apache.org/jira/browse/SPARK-11410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 1.6.0
>
>
> DISTRIBUTE BY allows the user to control the partitioning and ordering of a 
> data set which can be very useful for some applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

2015-12-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055189#comment-15055189
 ] 

Maciej Bryński commented on SPARK-11410:


And what about such example.
We can partition table by column.
Then we run the query:
{code}
select * from table where column = value
{code}
In this case Spark should scan only one partition.

PS.
Is partitioning is saved in parquet format ?

> Add a DataFrame API that provides functionality similar to HiveQL's 
> DISTRIBUTE BY
> -
>
> Key: SPARK-11410
> URL: https://issues.apache.org/jira/browse/SPARK-11410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 1.6.0
>
>
> DISTRIBUTE BY allows the user to control the partitioning and ordering of a 
> data set which can be very useful for some applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

2015-12-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055189#comment-15055189
 ] 

Maciej Bryński edited comment on SPARK-11410 at 12/13/15 10:16 PM:
---

And what about such example.
We can partition table by column.
Then we run the query:
{code}
select * from table where column = value
{code}
In this case Spark should scan only one partition.

PS.
Is partitioning is saved in parquet format ?


was (Author: maver1ck):
And what about such example.
We can partition table by column.
Then we run the query:
{code}
select * from table where column = value
{code}
In this case Spark should scan only one partition.

PS.
Is partitioning is saved in parquet format ?

> Add a DataFrame API that provides functionality similar to HiveQL's 
> DISTRIBUTE BY
> -
>
> Key: SPARK-11410
> URL: https://issues.apache.org/jira/browse/SPARK-11410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 1.6.0
>
>
> DISTRIBUTE BY allows the user to control the partitioning and ordering of a 
> data set which can be very useful for some applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

2015-12-13 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055188#comment-15055188
 ] 

Yin Huai commented on SPARK-11410:
--

Yes, we do. For example, if you cache the table after call repartition, spark 
sql understands that the table has been partitioned and will try to avoid of 
shuffling if  a query requires to use the same column(s) to shuffle data. In 
future, we will store the partitioning info in metastore. So, users can 
pre-shuffle data or co-partition their tables.

> Add a DataFrame API that provides functionality similar to HiveQL's 
> DISTRIBUTE BY
> -
>
> Key: SPARK-11410
> URL: https://issues.apache.org/jira/browse/SPARK-11410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 1.6.0
>
>
> DISTRIBUTE BY allows the user to control the partitioning and ordering of a 
> data set which can be very useful for some applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11410) Add a DataFrame API that provides functionality similar to HiveQL's DISTRIBUTE BY

2015-12-13 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055182#comment-15055182
 ] 

Maciej Bryński commented on SPARK-11410:


Is it possible for Spark to use information about partitioning to optimize 
queries ?


> Add a DataFrame API that provides functionality similar to HiveQL's 
> DISTRIBUTE BY
> -
>
> Key: SPARK-11410
> URL: https://issues.apache.org/jira/browse/SPARK-11410
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Nong Li
>Assignee: Nong Li
> Fix For: 1.6.0
>
>
> DISTRIBUTE BY allows the user to control the partitioning and ordering of a 
> data set which can be very useful for some applications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110
 ] 

Xiao Li edited comment on SPARK-12288 at 12/13/15 9:49 PM:
---

[~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def outputsUnsafeRows: Boolean = children.forall(_.outputsUnsafeRows)
override def canProcessUnsafeRows: Boolean = true
override def canProcessSafeRows: Boolean = true
{code}

Is my understanding correct?

Thanks!


was (Author: smilegator):
[~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def canProcessUnsafeRows: Boolean = true
{code}

Is my understanding correct?

Thanks!

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dev Lakhani closed SPARK-10798.
---
Resolution: Cannot Reproduce

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10798) JsonMappingException with Spark Context Parallelize

2015-12-13 Thread Dev Lakhani (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055130#comment-15055130
 ] 

Dev Lakhani commented on SPARK-10798:
-

byte[] data= Kryo.serialize(List)

This is just shorthand for new Kryo().serialize(). I think this issue was a 
classpath issue, I was not able to reproduce it, but if it reappears I will 
re-open it.

> JsonMappingException with Spark Context Parallelize
> ---
>
> Key: SPARK-10798
> URL: https://issues.apache.org/jira/browse/SPARK-10798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.0
> Environment: Linux, Java 1.8.45
>Reporter: Dev Lakhani
>
> When trying to create an RDD of Rows using a Java Spark Context and if I 
> serialize the rows with Kryo first, the sparkContext fails.
> byte[] data= Kryo.serialize(List)
> List fromKryoRows=Kryo.unserialize(data)
> List rows= new Vector(); //using a new set of data.
> rows.add(RowFactory.create("test"));
> javaSparkContext.parallelize(rows);
> OR
> javaSparkContext.parallelize(fromKryoRows); //using deserialized rows
> I get :
> com.fasterxml.jackson.databind.JsonMappingException: (None,None) (of class 
> scala.Tuple2) (through reference chain: 
> org.apache.spark.rdd.RDDOperationScope["parent"])
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:210)
>at 
> com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:177)
>at 
> com.fasterxml.jackson.databind.ser.std.StdSerializer.wrapAndThrow(StdSerializer.java:187)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:647)
>at 
> com.fasterxml.jackson.databind.ser.BeanSerializer.serialize(BeanSerializer.java:152)
>at 
> com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
>at 
> com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:2881)
>at 
> com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:2338)
>at 
> org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:50)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:141)
>at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>at 
> org.apache.spark.SparkContext.withScope(SparkContext.scala:700)
>at 
> org.apache.spark.SparkContext.parallelize(SparkContext.scala:714)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:145)
>at 
> org.apache.spark.api.java.JavaSparkContext.parallelize(JavaSparkContext.scala:157)
>...
> Caused by: scala.MatchError: (None,None) (of class scala.Tuple2)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply$mcV$sp(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer$$anonfun$serialize$1.apply(OptionSerializerModule.scala:32)
>at scala.Option.getOrElse(Option.scala:120)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:31)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionSerializer.serialize(OptionSerializerModule.scala:22)
>at 
> com.fasterxml.jackson.databind.ser.BeanPropertyWriter.serializeAsField(BeanPropertyWriter.java:505)
>at 
> com.fasterxml.jackson.module.scala.ser.OptionPropertyWriter.serializeAsField(OptionSerializerModule.scala:128)
>at 
> com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.serializeFields(BeanSerializerBase.java:639)
>... 19 more
> I've tried updating jackson module scala to 2.6.1 but same issue. This 
> happens in local mode with java 1.8_45. I searched the web and this Jira for 
> similar issues but found nothing of interest.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110
 ] 

Xiao Li edited comment on SPARK-12288 at 12/13/15 8:27 PM:
---

[~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def canProcessUnsafeRows: Boolean = true
{code}

Is my understanding correct?

Thanks!


was (Author: smilegator):
[~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def canProcessUnsafeRows: Boolean = true
{code}

Thanks!

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055110#comment-15055110
 ] 

Xiao Li commented on SPARK-12288:
-

[~davies] It sounds like Except and Intersect can support UnsafeRow? 

We just need to add the following line to Except and Intersect
{code}
override def canProcessUnsafeRows: Boolean = true
{code}

Thanks!

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055073#comment-15055073
 ] 

Sean Owen commented on SPARK-12311:
---

LGTM, feel free to make a pull request.

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors

2015-12-13 Thread nabacg (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nabacg updated SPARK-12312:
---
Description: 
When loading DataFrames from JDBC datasource with Kerberos authentication, 
remote executors (yarn-client/cluster etc. modes) fail to establish a 
connection due to lack of Kerberos ticket or ability to generate it. 

This is a real issue when trying to ingest data from kerberized data sources 
(SQL Server, Oracle) in enterprise environment where exposing simple 
authentication access is not an option due to IT policy issues.

  was:When loading DataFrames from JDBC datasource with Kerberos authentication 
(SQL Server, Oracle), remote executors (yarn-client/cluster etc. modes) fail to 
establish a connection due to lack of Kerberos ticket or ability to generate 
it. 


> JDBC connection to Kerberos secured databases fails on remote executors
> ---
>
> Key: SPARK-12312
> URL: https://issues.apache.org/jira/browse/SPARK-12312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: nabacg
>Priority: Minor
>
> When loading DataFrames from JDBC datasource with Kerberos authentication, 
> remote executors (yarn-client/cluster etc. modes) fail to establish a 
> connection due to lack of Kerberos ticket or ability to generate it. 
> This is a real issue when trying to ingest data from kerberized data sources 
> (SQL Server, Oracle) in enterprise environment where exposing simple 
> authentication access is not an option due to IT policy issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors

2015-12-13 Thread nabacg (JIRA)

nabacg created SPARK-12312:
--

 Summary: JDBC connection to Kerberos secured databases fails on 
remote executors
 Key: SPARK-12312
 URL: https://issues.apache.org/jira/browse/SPARK-12312
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.2
Reporter: nabacg
Priority: Minor


When loading DataFrames from JDBC datasource with Kerberos authentication (SQL 
Server, Oracle), remote executors (yarn-client/cluster etc. modes) fail to 
establish a connection due to lack of Kerberos ticket or ability to generate 
it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12293:


Assignee: Apache Spark

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12293:


Assignee: (was: Apache Spark)

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12293:


Assignee: Apache Spark

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-13 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055019#comment-15055019
 ] 

Apache Spark commented on SPARK-12293:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/10283

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-12311:
-
Description: 
Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
specific value (e.g. "amd64") into system property "os.arch", they do not 
restore the original value of "os.arch" after these test suites. This may lead 
to failures in a test case that depends on architecture on other platform 
rather than amd64.
They should save the original value of "os.arch" and restore this at the end of 
these test suites.

 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala

  was:
Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
specific value (e.g. "amd64") into system property "os.arch", they do not 
restore the original value of "os.arch" after these test suites. This may lead 
to failures in test cases that depends on architecture on other platform rather 
than amd64.
They should save the original value of "os.arch" and restore this at the end of 
these test suites.

 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala


> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in a test case that depends on architecture on other 
> platform rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12311) [CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-12311:
-
Summary: [CORE] Restore previous value of "os.arch" property in test suites 
after forcing to set specific value to "os.arch" property  (was: [CORE] Restore 
previous "os.arch" property in test suites after forcing to set specific value 
to "os.arch" property)

> [CORE] Restore previous value of "os.arch" property in test suites after 
> forcing to set specific value to "os.arch" property
> 
>
> Key: SPARK-12311
> URL: https://issues.apache.org/jira/browse/SPARK-12311
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
> specific value (e.g. "amd64") into system property "os.arch", they do not 
> restore the original value of "os.arch" after these test suites. This may 
> lead to failures in test cases that depends on architecture on other platform 
> rather than amd64.
> They should save the original value of "os.arch" and restore this at the end 
> of these test suites.
>  
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12311) [CORE] Restore previous "os.arch" property in test suites after forcing to set specific value to "os.arch" property

2015-12-13 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-12311:


 Summary: [CORE] Restore previous "os.arch" property in test suites 
after forcing to set specific value to "os.arch" property
 Key: SPARK-12311
 URL: https://issues.apache.org/jira/browse/SPARK-12311
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.2
Reporter: Kazuaki Ishizaki
Priority: Minor


Although current BlockManagerSuite.scala and SizeEstimatorSuite.scala set the 
specific value (e.g. "amd64") into system property "os.arch", they do not 
restore the original value of "os.arch" after these test suites. This may lead 
to failures in test cases that depends on architecture on other platform rather 
than amd64.
They should save the original value of "os.arch" and restore this at the end of 
these test suites.

 
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/util/SizeEstimatorSuite.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12306) Add an option to ignore BlockRDD partition data loss

2015-12-13 Thread Liwei Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054995#comment-15054995
 ] 

Liwei Lin commented on SPARK-12306:
---

Sorry for the inconvenience; I've added some detailed description, [~srowen] 
would you mind reopen it? Thanks. :-)

> Add an option to ignore BlockRDD partition data loss
> 
>
> Key: SPARK-12306
> URL: https://issues.apache.org/jira/browse/SPARK-12306
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>
> Currently in Spark Streaming, a Receiver stores the received data into some 
> BlockManager and then later the data will be used by a BlockRDD. If this 
> BlockManager were to lost because of some failure, then this BlockRDD would 
> throw a SparkException saying "Could not compute split, block not found".
> In most cases this is the right thing to do. However, in a streaming scenario 
> where it can tolerant small pieces of data loss, maybe just move on silently 
> -- instead of throwing an exception -- is more preferable.
> This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
> option, which defaults to false, to tell whether to throw an exception or 
> just move on when a block is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12306) Add an option to ignore BlockRDD partition data loss

2015-12-13 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-12306:
--
Description: 
Currently in Spark Streaming, a Receiver stores the received data into some 
BlockManager and then later the data will be used by a BlockRDD. If this 
BlockManager were to lost because of some failure, then this BlockRDD would 
throw a SparkException saying "Could not compute split, block not found".

In most cases this is the right thing to do. However, in a streaming scenario 
where it can tolerant small pieces of data loss, maybe just move on silently -- 
instead of throwing an exception -- is more preferable.

This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
option, which defaults to false, to tell whether to throw an exception or just 
move on when a block is not found.


  was:
Currently in Spark Streaming, a Receiver stores the received data into some 
BlockManager and then later the data will be used by a BlockRDD. If this 
BlockManager were to lost because of some failure, then this BlockRDD would 
throw a SparkException with the "Could not compute split, block not found" 
message.

In most cases this is the right thing to do. But in a streaming scenario where 
it can tolerant small pieces of data loss, maybe just move on silently -- 
instead of throwing an exception -- is more preferable.

This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
option, which defaults to false, the tell whether to throw an exception or just 
move on when a block is not found.



> Add an option to ignore BlockRDD partition data loss
> 
>
> Key: SPARK-12306
> URL: https://issues.apache.org/jira/browse/SPARK-12306
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>
> Currently in Spark Streaming, a Receiver stores the received data into some 
> BlockManager and then later the data will be used by a BlockRDD. If this 
> BlockManager were to lost because of some failure, then this BlockRDD would 
> throw a SparkException saying "Could not compute split, block not found".
> In most cases this is the right thing to do. However, in a streaming scenario 
> where it can tolerant small pieces of data loss, maybe just move on silently 
> -- instead of throwing an exception -- is more preferable.
> This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
> option, which defaults to false, to tell whether to throw an exception or 
> just move on when a block is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12306) Add an option to ignore BlockRDD partition data loss

2015-12-13 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-12306:
--
Description: 
Currently in Spark Streaming, a Receiver stores the received data into some 
BlockManager and then later the data will be used by a BlockRDD. If this 
BlockManager were to lost because of some failure, then this BlockRDD would 
throw a SparkException with the "Could not compute split, block not found" 
message.

In most cases this is the right thing to do. But in a streaming scenario where 
it can tolerant small pieces of data loss, maybe just move on silently -- 
instead of throwing an exception -- is more preferable.

This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
option, which defaults to false, the tell whether to throw an exception or just 
move on when a block is not found.


> Add an option to ignore BlockRDD partition data loss
> 
>
> Key: SPARK-12306
> URL: https://issues.apache.org/jira/browse/SPARK-12306
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>
> Currently in Spark Streaming, a Receiver stores the received data into some 
> BlockManager and then later the data will be used by a BlockRDD. If this 
> BlockManager were to lost because of some failure, then this BlockRDD would 
> throw a SparkException with the "Could not compute split, block not found" 
> message.
> In most cases this is the right thing to do. But in a streaming scenario 
> where it can tolerant small pieces of data loss, maybe just move on silently 
> -- instead of throwing an exception -- is more preferable.
> This issue proposes to add such an "spark.streaming.ignoreBlockNotFound" 
> option, which defaults to false, the tell whether to throw an exception or 
> just move on when a block is not found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI

2015-12-13 Thread Liwei Lin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054983#comment-15054983
 ] 

Liwei Lin commented on SPARK-12305:
---

Sorry for the inconvenience; I've added some detailed description, [~srowen]  
would you mind reopen it? Thanks. :-)

> Add Receiver scheduling info onto Spark Streaming web UI
> 
>
> Key: SPARK-12305
> URL: https://issues.apache.org/jira/browse/SPARK-12305
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>Priority: Minor
>
> Spark 1.5 has added better Receiver scheduling support, via which users can 
> deploy Receivers to certain Executors in a way they wish.
> However, neither 'Receiver.preferredLocations' info nor the candidate 
> Executors info are displayed on the web UI. Then when Receivers are not 
> scheduled in the way users have specified, it's non-trivial for the users to 
> find out why.
> This issue proposes to add Receiver scheduling info, including 
> 'Receiver.preferredLocations' info as well as the candidate Executors info, 
> onto Spark Streaming web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI

2015-12-13 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-12305:
--
Description: 
Spark 1.5 has added better Receiver scheduling support, via which users can 
deploy Receivers to certain Executors in a way they wish.

However, neither 'Receiver.preferredLocations' info nor the candidate Executors 
info are displayed on the web UI. Then when Receivers are not scheduled in the 
way users have specified, it's non-trivial for the users to find out why.

This issue proposes to add Receiver scheduling info, including 
'Receiver.preferredLocations' info as well as the candidate Executors info, 
onto Spark Streaming web UI.

  was:Spark 1.5 has added better Receiver scheduling support,


> Add Receiver scheduling info onto Spark Streaming web UI
> 
>
> Key: SPARK-12305
> URL: https://issues.apache.org/jira/browse/SPARK-12305
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>Priority: Minor
>
> Spark 1.5 has added better Receiver scheduling support, via which users can 
> deploy Receivers to certain Executors in a way they wish.
> However, neither 'Receiver.preferredLocations' info nor the candidate 
> Executors info are displayed on the web UI. Then when Receivers are not 
> scheduled in the way users have specified, it's non-trivial for the users to 
> find out why.
> This issue proposes to add Receiver scheduling info, including 
> 'Receiver.preferredLocations' info as well as the candidate Executors info, 
> onto Spark Streaming web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI

2015-12-13 Thread Liwei Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-12305:
--
Description: Spark 1.5 has added better Receiver scheduling support,

> Add Receiver scheduling info onto Spark Streaming web UI
> 
>
> Key: SPARK-12305
> URL: https://issues.apache.org/jira/browse/SPARK-12305
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>Priority: Minor
>
> Spark 1.5 has added better Receiver scheduling support,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12310) Add write.json and write.parquet for SparkR

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12310:


Assignee: Apache Spark

> Add write.json and write.parquet for SparkR
> ---
>
> Key: SPARK-12310
> URL: https://issues.apache.org/jira/browse/SPARK-12310
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>
> Add write.json and write.parquet for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12309) Use sqlContext from MLlibTestSparkContext for spark.ml test suites

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12309:


Assignee: (was: Apache Spark)

> Use sqlContext from MLlibTestSparkContext for spark.ml test suites
> --
>
> Key: SPARK-12309
> URL: https://issues.apache.org/jira/browse/SPARK-12309
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>
> Use sqlContext from MLlibTestSparkContext rather than creating new one for 
> spark.ml test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-12309) Use sqlContext from MLlibTestSparkContext for spark.ml test suites

2015-12-13 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-12309:


Assignee: Apache Spark

> Use sqlContext from MLlibTestSparkContext for spark.ml test suites
> --
>
> Key: SPARK-12309
> URL: https://issues.apache.org/jira/browse/SPARK-12309
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Yanbo Liang
>Assignee: Apache Spark
>
> Use sqlContext from MLlibTestSparkContext rather than creating new one for 
> spark.ml test cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12306) Add an option to ignore BlockRDD partition data loss

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12306.
---
Resolution: Invalid

> Add an option to ignore BlockRDD partition data loss
> 
>
> Key: SPARK-12306
> URL: https://issues.apache.org/jira/browse/SPARK-12306
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12282) Document spark.jars

2015-12-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054927#comment-15054927
 ] 

Sean Owen commented on SPARK-12282:
---

Yeah, I suspect it happens to work fine, since that's what --jars sets too. I 
don't think it's tested/guaranteed as an API for all modes. I can see wanting 
to make everything a conf value, but lots of things aren't at this stage 
anyway. (And if you really want to, you can set it this way if you're OK with 
it maybe not working in a future version.) I don't think it adds up to a need 
to expose this property.

> Document spark.jars
> ---
>
> Key: SPARK-12282
> URL: https://issues.apache.org/jira/browse/SPARK-12282
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Reporter: Justin Bailey
>Priority: Trivial
>
> The spark.jars property (as implemented in SparkSubmit.scala,  
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L516)
>  is not documented anywhere, and should be.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12305) Add Receiver scheduling info onto Spark Streaming web UI

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12305.
---
Resolution: Invalid

> Add Receiver scheduling info onto Spark Streaming web UI
> 
>
> Key: SPARK-12305
> URL: https://issues.apache.org/jira/browse/SPARK-12305
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.5.2
>Reporter: Liwei Lin
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-12238) s/Advanced sources/External Sources in docs.

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-12238.
---
Resolution: Won't Fix

> s/Advanced sources/External Sources in docs.
> 
>
> Key: SPARK-12238
> URL: https://issues.apache.org/jira/browse/SPARK-12238
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Streaming
>Reporter: Prashant Sharma
>
> While reading the docs, I felt reading as external sources(instead of 
> Advanced sources) seemed more appropriate as in they belong outside streaming 
> core project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12260) Graceful Shutdown with In-Memory State

2015-12-13 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054923#comment-15054923
 ] 

Sean Owen commented on SPARK-12260:
---

I'm asking what you can't do with updateStateByKey that you need to do?
This sounds pretty app-specific, and something you can maintain in your own app 
or repo to start.

> Graceful Shutdown with In-Memory State
> --
>
> Key: SPARK-12260
> URL: https://issues.apache.org/jira/browse/SPARK-12260
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Mao, Wei
>  Labels: streaming
>
> Users often stop and restart their streaming jobs for tasks such as 
> maintenance, software upgrades or even application logic updates. When a job 
> re-starts it should pick up where it left off i.e. any state information that 
> existed when the job stopped should be used as the initial state when the job 
> restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11707) StreamCorruptedException if authentication is enabled

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-11707.
---
Resolution: Cannot Reproduce

> StreamCorruptedException if authentication is enabled
> -
>
> Key: SPARK-11707
> URL: https://issues.apache.org/jira/browse/SPARK-11707
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Jacek Lewandowski
>
> When authentication (and encryption) is enabled (at least in standalone 
> mode), the following code (in Spark shell):
> {code}
> sc.makeRDD(1 to 10, 10).map(x => x*x).map(_.toString).reduce(_ + _)
> {code}
> finishes with exception:
> {noformat}
> [Stage 0:> (0 + 8) / 
> 10]15/11/12 20:36:29 ERROR TransportRequestHandler: Error while invoking 
> RpcHandler#receive() on RPC id 5750598674048943239
> java.io.StreamCorruptedException: invalid type code: 30
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2508)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2543)
>   at 
> java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2702)
>   at java.io.ObjectInputStream.read(ObjectInputStream.java:865)
>   at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
>   at 
> org.apache.spark.util.SerializableBuffer$$anonfun$readObject$1.apply(SerializableBuffer.scala:38)
>   at 
> org.apache.spark.util.SerializableBuffer$$anonfun$readObject$1.apply(SerializableBuffer.scala:32)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1186)
>   at 
> org.apache.spark.util.SerializableBuffer.readObject(SerializableBuffer.scala:32)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:248)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:296)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:247)
>   at 
> org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:448)
>   at 
> org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:76)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:122)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:94)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:101)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>   at 
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)

[jira] [Updated] (SPARK-12284) Output UnsafeRow from window function

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12284:
--
Component/s: SQL

> Output UnsafeRow from window function
> -
>
> Key: SPARK-12284
> URL: https://issues.apache.org/jira/browse/SPARK-12284
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12295) Manage the memory used by window function

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12295:
--
Component/s: SQL

> Manage the memory used by window function
> -
>
> Key: SPARK-12295
> URL: https://issues.apache.org/jira/browse/SPARK-12295
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> The buffered rows for a given frame should use UnsafeRow, and stored as pages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12289) Support UnsafeRow in TakeOrderedAndProject/Limit

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12289:
--
Component/s: SQL

> Support UnsafeRow in TakeOrderedAndProject/Limit
> 
>
> Key: SPARK-12289
> URL: https://issues.apache.org/jira/browse/SPARK-12289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12287) Support UnsafeRow in MapPartitions/MapGroups/CoGroup

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12287:
--
Component/s: SQL

> Support UnsafeRow in MapPartitions/MapGroups/CoGroup
> 
>
> Key: SPARK-12287
> URL: https://issues.apache.org/jira/browse/SPARK-12287
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12291:
--
Component/s: SQL

> Support UnsafeRow in BroadcastLeftSemiJoinHash
> --
>
> Key: SPARK-12291
> URL: https://issues.apache.org/jira/browse/SPARK-12291
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12288) Support UnsafeRow in Coalesce/Except/Intersect

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12288:
--
Component/s: SQL

> Support UnsafeRow in Coalesce/Except/Intersect
> --
>
> Key: SPARK-12288
> URL: https://issues.apache.org/jira/browse/SPARK-12288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12293) Support UnsafeRow in LocalTableScan

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12293:
--
Component/s: SQL

> Support UnsafeRow in LocalTableScan
> ---
>
> Key: SPARK-12293
> URL: https://issues.apache.org/jira/browse/SPARK-12293
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12290) Change the default value in SparkPlan

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12290:
--
Component/s: SQL

> Change the default value in SparkPlan
> -
>
> Key: SPARK-12290
> URL: https://issues.apache.org/jira/browse/SPARK-12290
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>
> supportUnsafeRows = true
> supportSafeRows = false  //
> outputUnsafeRows = true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12292) Support UnsafeRow in Generate

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12292:
--
Component/s: SQL

> Support UnsafeRow in Generate
> -
>
> Key: SPARK-12292
> URL: https://issues.apache.org/jira/browse/SPARK-12292
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-12294) Support UnsafeRow in HiveTableScan

2015-12-13 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-12294:
--
Component/s: SQL

> Support UnsafeRow in HiveTableScan
> --
>
> Key: SPARK-12294
> URL: https://issues.apache.org/jira/browse/SPARK-12294
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Davies Liu
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-12310) Add write.json and write.parquet for SparkR

2015-12-13 Thread Yanbo Liang (JIRA)

Yanbo Liang created SPARK-12310:
---

 Summary: Add write.json and write.parquet for SparkR
 Key: SPARK-12310
 URL: https://issues.apache.org/jira/browse/SPARK-12310
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yanbo Liang


Add write.json and write.parquet for SparkR



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

81 matches

Mail list logo