from:"\"Apache Spark \\\\\\\(Jira\\\\\\\)\""

[jira] [Commented] (SPARK-18553) Executor loss may cause TaskSetManager to be leaked

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707105#comment-15707105
 ] 

Apache Spark commented on SPARK-18553:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16070

> Executor loss may cause TaskSetManager to be leaked
> ---
>
> Key: SPARK-18553
> URL: https://issues.apache.org/jira/browse/SPARK-18553
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.6.0, 2.0.0, 2.1.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
> Fix For: 2.0.3, 2.1.0, 2.2.0
>
>
> Due to a bug in TaskSchedulerImpl, the complete sudden loss of an executor 
> may cause a TaskSetManager to be leaked, causing ShuffleDependencies and 
> other data structures to be kept alive indefinitely, leading to various types 
> of resource leaks (including shuffle file leaks).
> In a nutshell, the problem is that TaskSchedulerImpl did not maintain its own 
> mapping from executorId to running task ids, leaving it unable to clean up 
> taskId to taskSetManager maps when an executor is totally lost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18635) Partition name/values not escaped correctly in some cases

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707211#comment-15707211
 ] 

Apache Spark commented on SPARK-18635:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/16071

> Partition name/values not escaped correctly in some cases
> -
>
> Key: SPARK-18635
> URL: https://issues.apache.org/jira/browse/SPARK-18635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>Priority: Critical
>
> For example, the following command does not insert data properly into the 
> table
> {code}
> spark.sqlContext.range(10).selectExpr("id", "id as A", "'A$\\=%' as 
> B").write.partitionBy("A", "B").mode("overwrite").saveAsTable("testy")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18635) Partition name/values not escaped correctly in some cases

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18635:


Assignee: (was: Apache Spark)

> Partition name/values not escaped correctly in some cases
> -
>
> Key: SPARK-18635
> URL: https://issues.apache.org/jira/browse/SPARK-18635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>Priority: Critical
>
> For example, the following command does not insert data properly into the 
> table
> {code}
> spark.sqlContext.range(10).selectExpr("id", "id as A", "'A$\\=%' as 
> B").write.partitionBy("A", "B").mode("overwrite").saveAsTable("testy")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18635) Partition name/values not escaped correctly in some cases

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18635:


Assignee: Apache Spark

> Partition name/values not escaped correctly in some cases
> -
>
> Key: SPARK-18635
> URL: https://issues.apache.org/jira/browse/SPARK-18635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>Assignee: Apache Spark
>Priority: Critical
>
> For example, the following command does not insert data properly into the 
> table
> {code}
> spark.sqlContext.range(10).selectExpr("id", "id as A", "'A$\\=%' as 
> B").write.partitionBy("A", "B").mode("overwrite").saveAsTable("testy")
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18639) Build only a single pip package

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18639:


Assignee: Apache Spark  (was: Reynold Xin)

> Build only a single pip package
> ---
>
> Key: SPARK-18639
> URL: https://issues.apache.org/jira/browse/SPARK-18639
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> We current build 5 separate pip binary tar balls, doubling the release script 
> runtime. It'd be better to build one, especially for use cases that are just 
> using Spark locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18639) Build only a single pip package

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18639:


Assignee: Reynold Xin  (was: Apache Spark)

> Build only a single pip package
> ---
>
> Key: SPARK-18639
> URL: https://issues.apache.org/jira/browse/SPARK-18639
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We current build 5 separate pip binary tar balls, doubling the release script 
> runtime. It'd be better to build one, especially for use cases that are just 
> using Spark locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18639) Build only a single pip package

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707258#comment-15707258
 ] 

Apache Spark commented on SPARK-18639:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16072

> Build only a single pip package
> ---
>
> Key: SPARK-18639
> URL: https://issues.apache.org/jira/browse/SPARK-18639
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We current build 5 separate pip binary tar balls, doubling the release script 
> runtime. It'd be better to build one, especially for use cases that are just 
> using Spark locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18640) Fix minor synchronization issue in TaskSchedulerImpl.runningTasksByExecutors

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707265#comment-15707265
 ] 

Apache Spark commented on SPARK-18640:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16073

> Fix minor synchronization issue in TaskSchedulerImpl.runningTasksByExecutors
> 
>
> Key: SPARK-18640
> URL: https://issues.apache.org/jira/browse/SPARK-18640
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Minor
>
> The method TaskSchedulerImpl.runningTasksByExecutors() accesses the mutable 
> executorIdToRunningTaskIds map without proper synchronization. We should fix 
> this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18145) Update documentation for hive partition management in 2.1

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18145:


Assignee: Apache Spark

> Update documentation for hive partition management in 2.1
> -
>
> Key: SPARK-18145
> URL: https://issues.apache.org/jira/browse/SPARK-18145
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18145) Update documentation for hive partition management in 2.1

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707288#comment-15707288
 ] 

Apache Spark commented on SPARK-18145:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/16074

> Update documentation for hive partition management in 2.1
> -
>
> Key: SPARK-18145
> URL: https://issues.apache.org/jira/browse/SPARK-18145
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18145) Update documentation for hive partition management in 2.1

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18145:


Assignee: (was: Apache Spark)

> Update documentation for hive partition management in 2.1
> -
>
> Key: SPARK-18145
> URL: https://issues.apache.org/jira/browse/SPARK-18145
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18516) Separate instantaneous state from progress performance statistics

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707399#comment-15707399
 ] 

Apache Spark commented on SPARK-18516:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/16075

> Separate instantaneous state from progress performance statistics
> -
>
> Key: SPARK-18516
> URL: https://issues.apache.org/jira/browse/SPARK-18516
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Blocker
> Fix For: 2.1.0
>
>
> There are two types of information that you want to be able to extract from a 
> running query: instantaneous _status_ and metrics about the performance as 
> make _progress_ in query processing.
> Today, these are conflated in a single {{StreamingQueryStatus}} object.  The 
> downside to this approach is that a user now needs to reason about what state 
> the query is in anytime they retrieve a status object.  Fields like 
> {{statusMessage}} don't appear in updates that come from listener bus.  
> Simlarly, {{inputRate}}/{{processingRate}} statistics are usually {{0}} when 
> you retrieve a status object from the query itself.
> I propose we make the follow changes:
>  - Make {{status}} only report instantaneous things, such as if data is 
> available or a human readable message about what phase we are currently in.
>  - Have a separate {{progress}} message that we report for each trigger with 
> the other performance information that lives in status today.  You should be 
> able to easily retrieve a configurable number of the most recent progress 
> messages instead of just the most recent.
> While we are making these changes, I propose that we also change {{id}} to be 
> a globally unique identifier, rather than a JVM unique one.  Without this its 
> hard to correlate performance across restarts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17732) ALTER TABLE DROP PARTITION should support comparators

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707594#comment-15707594
 ] 

Apache Spark commented on SPARK-17732:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/15987

> ALTER TABLE DROP PARTITION should support comparators
> -
>
> Key: SPARK-17732
> URL: https://issues.apache.org/jira/browse/SPARK-17732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>
> This issue aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in 
> Apache Spark 2.0 for backward compatibility.
> *Spark 1.6.2*
> {code}
> scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
> quarter STRING)")
> res0: org.apache.spark.sql.DataFrame = [result: string]
> scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
> res1: org.apache.spark.sql.DataFrame = [result: string]
> {code}
> *Spark 2.0*
> {code}
> scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, 
> quarter STRING)")
> res0: org.apache.spark.sql.DataFrame = []
> scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input '<' expecting {')', ','}(line 1, pos 42)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18324) ML, Graph 2.1 QA: Programming guide update and migration guide

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707675#comment-15707675
 ] 

Apache Spark commented on SPARK-18324:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/16076

> ML, Graph 2.1 QA: Programming guide update and migration guide
> --
>
> Key: SPARK-18324
> URL: https://issues.apache.org/jira/browse/SPARK-18324
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Critical
>
> Before the release, we need to update the MLlib and GraphX Programming 
> Guides.  Updates will include:
> * Add migration guide subsection.
> ** Use the results of the QA audit JIRAs and [SPARK-17692].
> * Check phrasing, especially in main sections (for outdated items such as "In 
> this release, ...")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18643) SparkR hangs at session start when installed as a package without SPARK_HOME set

2016-11-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707701#comment-15707701
 ] 

Apache Spark commented on SPARK-18643:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/16077

> SparkR hangs at session start when installed as a package without SPARK_HOME 
> set
> 
>
> Key: SPARK-18643
> URL: https://issues.apache.org/jira/browse/SPARK-18643
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Critical
>
> 1) Install SparkR from source package, ie.
> R CMD INSTALL SparkR_2.1.0.tar.gz
> 2) Start SparkR (not from sparkR shell)
> library(SparkR)
> sparkR.session()
> Notice SparkR hangs when it couldn't find spark-submit to launch the JVM 
> backend.
> If SparkR is running as a package and it has previously downloaded Spark Jar 
> it should be able to run as before without having to set SPARK_HOME. 
> Basically with this bug the auto install Spark will only work in the first 
> session.
> This seems to be a regression on the earlier behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18643) SparkR hangs at session start when installed as a package without SPARK_HOME set

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18643:


Assignee: Felix Cheung  (was: Apache Spark)

> SparkR hangs at session start when installed as a package without SPARK_HOME 
> set
> 
>
> Key: SPARK-18643
> URL: https://issues.apache.org/jira/browse/SPARK-18643
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Critical
>
> 1) Install SparkR from source package, ie.
> R CMD INSTALL SparkR_2.1.0.tar.gz
> 2) Start SparkR (not from sparkR shell)
> library(SparkR)
> sparkR.session()
> Notice SparkR hangs when it couldn't find spark-submit to launch the JVM 
> backend.
> If SparkR is running as a package and it has previously downloaded Spark Jar 
> it should be able to run as before without having to set SPARK_HOME. 
> Basically with this bug the auto install Spark will only work in the first 
> session.
> This seems to be a regression on the earlier behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18643) SparkR hangs at session start when installed as a package without SPARK_HOME set

2016-11-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18643:


Assignee: Apache Spark  (was: Felix Cheung)

> SparkR hangs at session start when installed as a package without SPARK_HOME 
> set
> 
>
> Key: SPARK-18643
> URL: https://issues.apache.org/jira/browse/SPARK-18643
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>Priority: Critical
>
> 1) Install SparkR from source package, ie.
> R CMD INSTALL SparkR_2.1.0.tar.gz
> 2) Start SparkR (not from sparkR shell)
> library(SparkR)
> sparkR.session()
> Notice SparkR hangs when it couldn't find spark-submit to launch the JVM 
> backend.
> If SparkR is running as a package and it has previously downloaded Spark Jar 
> it should be able to run as before without having to set SPARK_HOME. 
> Basically with this bug the auto install Spark will only work in the first 
> session.
> This seems to be a regression on the earlier behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18471) In treeAggregate, generate (big) zeros instead of sending them.

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708004#comment-15708004
 ] 

Apache Spark commented on SPARK-18471:
--

User 'AnthonyTruchet' has created a pull request for this issue:
https://github.com/apache/spark/pull/16078

> In treeAggregate, generate (big) zeros instead of sending them.
> ---
>
> Key: SPARK-18471
> URL: https://issues.apache.org/jira/browse/SPARK-18471
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Spark Core
>Reporter: Anthony Truchet
>Priority: Minor
>
> When using optimization routine like LBFGS, treeAggregate curently sends the 
> zero vector as part of the closure. This zero can be huge (e.g. ML vectors 
> with millions of zeros) but can be easily generated.
> Several option are possible (upcoming patches to come soon for some of them).
> On is to provide a treeAggregateWithZeroGenerator method (either in core on 
> in MLlib) which wrap treeAggregate in an option and generate the zero if None.
> Another one is to rewrite treeAggregate to wrap an underlying implementation 
> which use a zero generator directly.
> There might be other better alternative we have not spotted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18645) spark-daemon.sh arguments error lead to throws Unrecognized option

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18645:


Assignee: (was: Apache Spark)

> spark-daemon.sh arguments error lead to throws Unrecognized option
> --
>
> Key: SPARK-18645
> URL: https://issues.apache.org/jira/browse/SPARK-18645
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>
> {{start-thriftserver.sh}} can reproduce this:
> {noformat}
> [root@dev spark]# ./sbin/start-thriftserver.sh --conf 
> 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/tmp' 
> starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> failed to launch nice -n 0 bash 
> /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
>  --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 
> Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC 
> -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp:
>   Error starting HiveServer2 with given arguments: 
>   Unrecognized option: -XX:-HeapDumpOnOutOfMemoryError
> full log in 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18645) spark-daemon.sh arguments error lead to throws Unrecognized option

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18645:


Assignee: Apache Spark

> spark-daemon.sh arguments error lead to throws Unrecognized option
> --
>
> Key: SPARK-18645
> URL: https://issues.apache.org/jira/browse/SPARK-18645
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>
> {{start-thriftserver.sh}} can reproduce this:
> {noformat}
> [root@dev spark]# ./sbin/start-thriftserver.sh --conf 
> 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/tmp' 
> starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> failed to launch nice -n 0 bash 
> /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
>  --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 
> Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC 
> -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp:
>   Error starting HiveServer2 with given arguments: 
>   Unrecognized option: -XX:-HeapDumpOnOutOfMemoryError
> full log in 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18645) spark-daemon.sh arguments error lead to throws Unrecognized option

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708219#comment-15708219
 ] 

Apache Spark commented on SPARK-18645:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/16079

> spark-daemon.sh arguments error lead to throws Unrecognized option
> --
>
> Key: SPARK-18645
> URL: https://issues.apache.org/jira/browse/SPARK-18645
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.0
>Reporter: Yuming Wang
>
> {{start-thriftserver.sh}} can reproduce this:
> {noformat}
> [root@dev spark]# ./sbin/start-thriftserver.sh --conf 
> 'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/tmp' 
> starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> failed to launch nice -n 0 bash 
> /opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
>  --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 
> Thrift JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC 
> -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp:
>   Error starting HiveServer2 with given arguments: 
>   Unrecognized option: -XX:-HeapDumpOnOutOfMemoryError
> full log in 
> /tmp/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-dev.out
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18647) do not put provider in table properties for Hive serde table

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18647:


Assignee: Apache Spark  (was: Wenchen Fan)

> do not put provider in table properties for Hive serde table
> 
>
> Key: SPARK-18647
> URL: https://issues.apache.org/jira/browse/SPARK-18647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18647) do not put provider in table properties for Hive serde table

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15708324#comment-15708324
 ] 

Apache Spark commented on SPARK-18647:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16080

> do not put provider in table properties for Hive serde table
> 
>
> Key: SPARK-18647
> URL: https://issues.apache.org/jira/browse/SPARK-18647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18647) do not put provider in table properties for Hive serde table

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18647:


Assignee: Wenchen Fan  (was: Apache Spark)

> do not put provider in table properties for Hive serde table
> 
>
> Key: SPARK-18647
> URL: https://issues.apache.org/jira/browse/SPARK-18647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18475:


Assignee: (was: Apache Spark)

> Be able to provide higher parallelization for StructuredStreaming Kafka Source
> --
>
> Key: SPARK-18475
> URL: https://issues.apache.org/jira/browse/SPARK-18475
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Burak Yavuz
>
> Right now the StructuredStreaming Kafka Source creates as many Spark tasks as 
> there are TopicPartitions that we're going to read from Kafka.
> This doesn't work well when we have data skew, and there is no reason why we 
> shouldn't be able to increase parallelism further, i.e. have multiple Spark 
> tasks reading from the same Kafka TopicPartition.
> What this will mean is that we won't be able to use the "CachedKafkaConsumer" 
> for what it is defined for (being cached) in this use case, but the extra 
> overhead is worth handling data skew and increasing parallelism especially in 
> ETL use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18475:


Assignee: Apache Spark

> Be able to provide higher parallelization for StructuredStreaming Kafka Source
> --
>
> Key: SPARK-18475
> URL: https://issues.apache.org/jira/browse/SPARK-18475
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> Right now the StructuredStreaming Kafka Source creates as many Spark tasks as 
> there are TopicPartitions that we're going to read from Kafka.
> This doesn't work well when we have data skew, and there is no reason why we 
> shouldn't be able to increase parallelism further, i.e. have multiple Spark 
> tasks reading from the same Kafka TopicPartition.
> What this will mean is that we won't be able to use the "CachedKafkaConsumer" 
> for what it is defined for (being cached) in this use case, but the extra 
> overhead is worth handling data skew and increasing parallelism especially in 
> ETL use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18652) Include the example data with the pyspark package

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18652:


Assignee: Apache Spark

> Include the example data with the pyspark package
> -
>
> Key: SPARK-18652
> URL: https://issues.apache.org/jira/browse/SPARK-18652
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Reporter: Shuai Lin
>Assignee: Apache Spark
>Priority: Minor
>
> Since we already include the python examples in the pyspark package, we 
> should include the example data with it as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18652) Include the example data with the pyspark package

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18652:


Assignee: (was: Apache Spark)

> Include the example data with the pyspark package
> -
>
> Key: SPARK-18652
> URL: https://issues.apache.org/jira/browse/SPARK-18652
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Reporter: Shuai Lin
>Priority: Minor
>
> Since we already include the python examples in the pyspark package, we 
> should include the example data with it as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18652) Include the example data with the pyspark package

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709159#comment-15709159
 ] 

Apache Spark commented on SPARK-18652:
--

User 'lins05' has created a pull request for this issue:
https://github.com/apache/spark/pull/16082

> Include the example data with the pyspark package
> -
>
> Key: SPARK-18652
> URL: https://issues.apache.org/jira/browse/SPARK-18652
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Reporter: Shuai Lin
>Priority: Minor
>
> Since we already include the python examples in the pyspark package, we 
> should include the example data with it as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18097) Can't drop a table from Hive if the schema is corrupt

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709373#comment-15709373
 ] 

Apache Spark commented on SPARK-18097:
--

User 'jayadevanmurali' has created a pull request for this issue:
https://github.com/apache/spark/pull/16083

> Can't drop a table from Hive if the schema is corrupt
> -
>
> Key: SPARK-18097
> URL: https://issues.apache.org/jira/browse/SPARK-18097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Davies Liu
>
> When the schema of Hive table is broken, we can't drop the table using Spark 
> SQL, for example
> {code}
> Error in SQL statement: QueryExecutionException: FAILED: 
> IllegalArgumentException Error: > expected at the position 10 of 
> 'ss:string:struct<>' but ':' is found.
> at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:480)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:754)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:104)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:353)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:227)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:255)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:126)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:267)
>   at 
> org.apache.spark.sql.execution.command.ShowCreateTableCommand.run(tables.scala:753)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOper

[jira] [Assigned] (SPARK-18097) Can't drop a table from Hive if the schema is corrupt

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18097:


Assignee: Apache Spark

> Can't drop a table from Hive if the schema is corrupt
> -
>
> Key: SPARK-18097
> URL: https://issues.apache.org/jira/browse/SPARK-18097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Davies Liu
>Assignee: Apache Spark
>
> When the schema of Hive table is broken, we can't drop the table using Spark 
> SQL, for example
> {code}
> Error in SQL statement: QueryExecutionException: FAILED: 
> IllegalArgumentException Error: > expected at the position 10 of 
> 'ss:string:struct<>' but ':' is found.
> at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:480)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:754)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:104)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:353)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:227)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:255)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:126)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:267)
>   at 
> org.apache.spark.sql.execution.command.ShowCreateTableCommand.run(tables.scala:753)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPl

[jira] [Assigned] (SPARK-18097) Can't drop a table from Hive if the schema is corrupt

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18097:


Assignee: (was: Apache Spark)

> Can't drop a table from Hive if the schema is corrupt
> -
>
> Key: SPARK-18097
> URL: https://issues.apache.org/jira/browse/SPARK-18097
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Davies Liu
>
> When the schema of Hive table is broken, we can't drop the table using Spark 
> SQL, for example
> {code}
> Error in SQL statement: QueryExecutionException: FAILED: 
> IllegalArgumentException Error: > expected at the position 10 of 
> 'ss:string:struct<>' but ':' is found.
> at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:336)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:331)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:480)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:447)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseType(TypeInfoUtils.java:481)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseTypeInfos(TypeInfoUtils.java:305)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.getTypeInfosFromTypeString(TypeInfoUtils.java:754)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:104)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
>   at 
> org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1017)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:353)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getTableOption$1.apply(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:280)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:269)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:351)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$tableExists$1.apply(HiveExternalCatalog.scala:228)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:72)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:227)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:255)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:126)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:267)
>   at 
> org.apache.spark.sql.execution.command.ShowCreateTableCommand.run(tables.scala:753)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at

[jira] [Assigned] (SPARK-18654) JacksonParser.makeRootConverter has effectively unreachable code

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18654:


Assignee: Apache Spark

> JacksonParser.makeRootConverter has effectively unreachable code
> 
>
> Key: SPARK-18654
> URL: https://issues.apache.org/jira/browse/SPARK-18654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Assignee: Apache Spark
>Priority: Minor
>
> {{JacksonParser.makeRootConverter}} currently takes a {{DataType}} but is 
> only called with a {{StructType}}. Revising the method to only accept a 
> {{StructType}} allows us to remove some pattern matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18654) JacksonParser.makeRootConverter has effectively unreachable code

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709422#comment-15709422
 ] 

Apache Spark commented on SPARK-18654:
--

User 'NathanHowell' has created a pull request for this issue:
https://github.com/apache/spark/pull/16084

> JacksonParser.makeRootConverter has effectively unreachable code
> 
>
> Key: SPARK-18654
> URL: https://issues.apache.org/jira/browse/SPARK-18654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> {{JacksonParser.makeRootConverter}} currently takes a {{DataType}} but is 
> only called with a {{StructType}}. Revising the method to only accept a 
> {{StructType}} allows us to remove some pattern matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18654) JacksonParser.makeRootConverter has effectively unreachable code

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18654:


Assignee: (was: Apache Spark)

> JacksonParser.makeRootConverter has effectively unreachable code
> 
>
> Key: SPARK-18654
> URL: https://issues.apache.org/jira/browse/SPARK-18654
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> {{JacksonParser.makeRootConverter}} currently takes a {{DataType}} but is 
> only called with a {{StructType}}. Revising the method to only accept a 
> {{StructType}} allows us to remove some pattern matches.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18655) Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709454#comment-15709454
 ] 

Apache Spark commented on SPARK-18655:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16085

> Ignore Structured Streaming 2.0.2 logs in history server
> 
>
> Key: SPARK-18655
> URL: https://issues.apache.org/jira/browse/SPARK-18655
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Fix For: 2.1.0
>
>
> SPARK-18516 changes the event log format of Structured Streaming. We should 
> make sure our changes not break the history server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18655) Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18655:


Assignee: Shixiong Zhu  (was: Apache Spark)

> Ignore Structured Streaming 2.0.2 logs in history server
> 
>
> Key: SPARK-18655
> URL: https://issues.apache.org/jira/browse/SPARK-18655
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
> Fix For: 2.1.0
>
>
> SPARK-18516 changes the event log format of Structured Streaming. We should 
> make sure our changes not break the history server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18655) Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18655:


Assignee: Apache Spark  (was: Shixiong Zhu)

> Ignore Structured Streaming 2.0.2 logs in history server
> 
>
> Key: SPARK-18655
> URL: https://issues.apache.org/jira/browse/SPARK-18655
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>Priority: Blocker
> Fix For: 2.1.0
>
>
> SPARK-18516 changes the event log format of Structured Streaming. We should 
> make sure our changes not break the history server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18653) Dataset.show() generates incorrect padding for Unicode Character

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18653:


Assignee: Apache Spark

> Dataset.show() generates incorrect padding for Unicode Character
> 
>
> Key: SPARK-18653
> URL: https://issues.apache.org/jira/browse/SPARK-18653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> The following program generates incorrect space padding for 
> {{Dataset.show()}} since column name or column value has Unicode Character
> Program
> {code:java}
> case class UnicodeCaseClass(整数: Int, 実数: Double, s: String)
> val ds = Seq(UnicodeCaseClass(1, 1.1, "文字列1")).toDS
> ds.show
> {code}
> Output
> {code}
> +---+---++
> | 整数| 実数|   s|
> +---+---++
> |  1|1.1|文字列1|
> +---+---++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18653) Dataset.show() generates incorrect padding for Unicode Character

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18653:


Assignee: (was: Apache Spark)

> Dataset.show() generates incorrect padding for Unicode Character
> 
>
> Key: SPARK-18653
> URL: https://issues.apache.org/jira/browse/SPARK-18653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>
> The following program generates incorrect space padding for 
> {{Dataset.show()}} since column name or column value has Unicode Character
> Program
> {code:java}
> case class UnicodeCaseClass(整数: Int, 実数: Double, s: String)
> val ds = Seq(UnicodeCaseClass(1, 1.1, "文字列1")).toDS
> ds.show
> {code}
> Output
> {code}
> +---+---++
> | 整数| 実数|   s|
> +---+---++
> |  1|1.1|文字列1|
> +---+---++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18653) Dataset.show() generates incorrect padding for Unicode Character

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709468#comment-15709468
 ] 

Apache Spark commented on SPARK-18653:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/16086

> Dataset.show() generates incorrect padding for Unicode Character
> 
>
> Key: SPARK-18653
> URL: https://issues.apache.org/jira/browse/SPARK-18653
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>
> The following program generates incorrect space padding for 
> {{Dataset.show()}} since column name or column value has Unicode Character
> Program
> {code:java}
> case class UnicodeCaseClass(整数: Int, 実数: Double, s: String)
> val ds = Seq(UnicodeCaseClass(1, 1.1, "文字列1")).toDS
> ds.show
> {code}
> Output
> {code}
> +---+---++
> | 整数| 実数|   s|
> +---+---++
> |  1|1.1|文字列1|
> +---+---++
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18656) org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles requires too much memory in case of many columns

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18656:


Assignee: (was: Apache Spark)

> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles 
> requires too much memory in case of many columns
> --
>
> Key: SPARK-18656
> URL: https://issues.apache.org/jira/browse/SPARK-18656
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sina Sohangir
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles
> Is implemented in a way that is causes out of memory error for cases where 
> the number of columns are high.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18656) org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles requires too much memory in case of many columns

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709929#comment-15709929
 ] 

Apache Spark commented on SPARK-18656:
--

User 'sinasohangirsc' has created a pull request for this issue:
https://github.com/apache/spark/pull/16087

> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles 
> requires too much memory in case of many columns
> --
>
> Key: SPARK-18656
> URL: https://issues.apache.org/jira/browse/SPARK-18656
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sina Sohangir
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles
> Is implemented in a way that is causes out of memory error for cases where 
> the number of columns are high.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18656) org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles requires too much memory in case of many columns

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18656:


Assignee: Apache Spark

> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles 
> requires too much memory in case of many columns
> --
>
> Key: SPARK-18656
> URL: https://issues.apache.org/jira/browse/SPARK-18656
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sina Sohangir
>Assignee: Apache Spark
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> org.apache.spark.sql.execution.stat.StatFunctions#multipleApproxQuantiles
> Is implemented in a way that is causes out of memory error for cases where 
> the number of columns are high.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18659) Incorrect behaviors in overwrite table for datasource tables

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18659:


Assignee: Apache Spark

> Incorrect behaviors in overwrite table for datasource tables
> 
>
> Key: SPARK-18659
> URL: https://issues.apache.org/jira/browse/SPARK-18659
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Assignee: Apache Spark
>Priority: Blocker
>
> The following test cases fail due to a crash in hive client when dropping 
> partitions that don't contain files. The last one deletes too many files due 
> to a partition case resolution failure.
> {code}
>   test("foo") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test select id, id, 'x' from 
> range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("bar") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a, b) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("baz") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (A, B) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("qux") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a=1, b) select id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 10)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18659) Incorrect behaviors in overwrite table for datasource tables

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18659:


Assignee: (was: Apache Spark)

> Incorrect behaviors in overwrite table for datasource tables
> 
>
> Key: SPARK-18659
> URL: https://issues.apache.org/jira/browse/SPARK-18659
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Priority: Blocker
>
> The following test cases fail due to a crash in hive client when dropping 
> partitions that don't contain files. The last one deletes too many files due 
> to a partition case resolution failure.
> {code}
>   test("foo") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test select id, id, 'x' from 
> range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("bar") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a, b) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("baz") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (A, B) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("qux") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a=1, b) select id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 10)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18659) Incorrect behaviors in overwrite table for datasource tables

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15709938#comment-15709938
 ] 

Apache Spark commented on SPARK-18659:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/16088

> Incorrect behaviors in overwrite table for datasource tables
> 
>
> Key: SPARK-18659
> URL: https://issues.apache.org/jira/browse/SPARK-18659
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Priority: Blocker
>
> The following test cases fail due to a crash in hive client when dropping 
> partitions that don't contain files. The last one deletes too many files due 
> to a partition case resolution failure.
> {code}
>   test("foo") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test select id, id, 'x' from 
> range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("bar") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a, b) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("baz") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (A, B) select id, id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 1)
> }
>   }
>   test("qux") {
> withTable("test") {
>   spark.range(10)
> .selectExpr("id", "id as A", "'x' as B")
> .write.partitionBy("A", "B").mode("overwrite")
> .saveAsTable("test")
>   spark.sql("insert overwrite table test partition (a=1, b) select id, 
> 'x' from range(1)")
>   assert(spark.sql("select * from test").count() == 10)
> }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710116#comment-15710116
 ] 

Apache Spark commented on SPARK-18658:
--

User 'NathanHowell' has created a pull request for this issue:
https://github.com/apache/spark/pull/16089

> Writing to a text DataSource buffers one or more lines in memory
> 
>
> Key: SPARK-18658
> URL: https://issues.apache.org/jira/browse/SPARK-18658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> The JSON and CSV writing paths buffer entire lines (or multiple lines) in 
> memory prior to writing to disk. For large rows this is inefficient. It may 
> make sense to skip the {{TextOutputFormat}} record writer and go directly to 
> the underlying {{FSDataOutputStream}}, allowing the writers to append 
> arbitrary byte arrays (fractions of a row) instead of a full row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18658:


Assignee: (was: Apache Spark)

> Writing to a text DataSource buffers one or more lines in memory
> 
>
> Key: SPARK-18658
> URL: https://issues.apache.org/jira/browse/SPARK-18658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Priority: Minor
>
> The JSON and CSV writing paths buffer entire lines (or multiple lines) in 
> memory prior to writing to disk. For large rows this is inefficient. It may 
> make sense to skip the {{TextOutputFormat}} record writer and go directly to 
> the underlying {{FSDataOutputStream}}, allowing the writers to append 
> arbitrary byte arrays (fractions of a row) instead of a full row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18658) Writing to a text DataSource buffers one or more lines in memory

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18658:


Assignee: Apache Spark

> Writing to a text DataSource buffers one or more lines in memory
> 
>
> Key: SPARK-18658
> URL: https://issues.apache.org/jira/browse/SPARK-18658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Nathan Howell
>Assignee: Apache Spark
>Priority: Minor
>
> The JSON and CSV writing paths buffer entire lines (or multiple lines) in 
> memory prior to writing to disk. For large rows this is inefficient. It may 
> make sense to skip the {{TextOutputFormat}} record writer and go directly to 
> the underlying {{FSDataOutputStream}}, allowing the writers to append 
> arbitrary byte arrays (fractions of a row) instead of a full row.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18661) Creating a partitioned datasource table should not scan all files for table

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18661:


Assignee: (was: Apache Spark)

> Creating a partitioned datasource table should not scan all files for table
> ---
>
> Key: SPARK-18661
> URL: https://issues.apache.org/jira/browse/SPARK-18661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Priority: Blocker
>
> Even though in 2.1 creating a partitioned datasource table will not populate 
> the partition data by default (until the user issues MSCK REPAIR TABLE), it 
> seems we still scan the filesystem for no good reason.
> We should avoid doing this when the user specifies a schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18661) Creating a partitioned datasource table should not scan all files for table

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18661:


Assignee: Apache Spark

> Creating a partitioned datasource table should not scan all files for table
> ---
>
> Key: SPARK-18661
> URL: https://issues.apache.org/jira/browse/SPARK-18661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Assignee: Apache Spark
>Priority: Blocker
>
> Even though in 2.1 creating a partitioned datasource table will not populate 
> the partition data by default (until the user issues MSCK REPAIR TABLE), it 
> seems we still scan the filesystem for no good reason.
> We should avoid doing this when the user specifies a schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18661) Creating a partitioned datasource table should not scan all files for table

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710233#comment-15710233
 ] 

Apache Spark commented on SPARK-18661:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/16090

> Creating a partitioned datasource table should not scan all files for table
> ---
>
> Key: SPARK-18661
> URL: https://issues.apache.org/jira/browse/SPARK-18661
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Eric Liang
>Priority: Blocker
>
> Even though in 2.1 creating a partitioned datasource table will not populate 
> the partition data by default (until the user issues MSCK REPAIR TABLE), it 
> seems we still scan the filesystem for no good reason.
> We should avoid doing this when the user specifies a schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18122) Fallback to Kryo for unknown classes in ExpressionEncoder

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18122:


Assignee: Apache Spark

> Fallback to Kryo for unknown classes in ExpressionEncoder
> -
>
> Key: SPARK-18122
> URL: https://issues.apache.org/jira/browse/SPARK-18122
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Michael Armbrust
>Assignee: Apache Spark
>Priority: Critical
>
> In Spark 2.0 we fail to generate an encoder if any of the fields of the class 
> are not of a supported type.  One example is {{Option\[Set\[Int\]\]}}, but 
> there are many more.  We should give the user the option to fall back on 
> opaque kryo serialization in these cases for subtrees of the encoder, rather 
> than failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18122) Fallback to Kryo for unknown classes in ExpressionEncoder

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18122:


Assignee: (was: Apache Spark)

> Fallback to Kryo for unknown classes in ExpressionEncoder
> -
>
> Key: SPARK-18122
> URL: https://issues.apache.org/jira/browse/SPARK-18122
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Michael Armbrust
>Priority: Critical
>
> In Spark 2.0 we fail to generate an encoder if any of the fields of the class 
> are not of a supported type.  One example is {{Option\[Set\[Int\]\]}}, but 
> there are many more.  We should give the user the option to fall back on 
> opaque kryo serialization in these cases for subtrees of the encoder, rather 
> than failing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18617) Close "kryo auto pick" feature for Spark Streaming

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710308#comment-15710308
 ] 

Apache Spark commented on SPARK-18617:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16091

> Close "kryo auto pick" feature for Spark Streaming
> --
>
> Key: SPARK-18617
> URL: https://issues.apache.org/jira/browse/SPARK-18617
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 2.1.0
>
>
> [PR-15992| https://github.com/apache/spark/pull/15992] provided a solution to 
> fix the bug, i.e. {{receiver data can not be deserialized properly}}. As 
> [~zsxwing] said, it is a critical bug, but we should not break APIs between 
> maintenance releases. It may be a rational choice to close {{auto pick kryo 
> serializer}} for Spark Streaming in the first step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18560) Receiver data can not be dataSerialized properly.

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710309#comment-15710309
 ] 

Apache Spark commented on SPARK-18560:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16091

> Receiver data can not be dataSerialized properly.
> -
>
> Key: SPARK-18560
> URL: https://issues.apache.org/jira/browse/SPARK-18560
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Genmao Yu
>Priority: Critical
>
> My spark streaming job can run correctly on Spark 1.6.1, but it can not run 
> properly on Spark 2.0.1, with following exception:
> {code}
> 16/11/22 19:20:15 ERROR executor.Executor: Exception in task 4.3 in stage 6.0 
> (TID 87)
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:243)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1150)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1150)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1943)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1943)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Go deep into  relevant implementation, I find the type of data received by 
> {{Receiver}} is erased. And in Spark2.x, framework can choose a appropriate 
> {{Serializer}} from {{JavaSerializer}} and {{KryoSerializer}} base on the 
> type of data. 
> At the {{Receiver}} side, the type of data is erased to be {{Object}}, so 
> framework will choose {{JavaSerializer}}, with following code:
> {code}
> def canUseKryo(ct: ClassTag[_]): Boolean = {
> primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag
>   }
>   def getSerializer(ct: ClassTag[_]): Serializer = {
> if (canUseKryo(ct)) {
>   kryoSerializer
> } else {
>   defaultSerializer
> }
>   }
> {code}
> At task side, we can get correct data type, and framework will choose 
> {{KryoSerializer}} if possible, with following supported type:
> {code}
> private[this] val stringClassTag: ClassTag[String] = 
> implicitly[ClassTag[String]]
> private[this] val primitiveAndPrimitiveArrayClassTags: Set[ClassTag[_]] = {
> val primitiveClassTags = Set[ClassTag[_]](
>   ClassTag.Boolean,
>   ClassTag.Byte,
>   ClassTag.Char,
>   ClassTag.Double,
>   ClassTag.Float,
>   ClassTag.Int,
>   ClassTag.Long,
>   ClassTag.Null,
>   ClassTag.Short
> )
> val arrayClassTags = primitiveClassTags.map(_.wrap)
> primitiveClassTags ++ arrayClassTags
>   }
> {code}
> In my case, the type of data is Byte Array.
> This problem stems from SPARK-13990, a patch to have Spark automatically pick 
> the "best" serializer when caching RDDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18662) Move cluster managers into their own sub-directory

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18662:


Assignee: Apache Spark

> Move cluster managers into their own sub-directory
> --
>
> Key: SPARK-18662
> URL: https://issues.apache.org/jira/browse/SPARK-18662
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Anirudh Ramanathan
>Assignee: Apache Spark
>Priority: Minor
>
> As we move to support Kubernetes in addition to Yarn and Mesos 
> (https://issues.apache.org/jira/browse/SPARK-18278), we should move all the 
> cluster managers into a "resource-managers/" sub-directory. This is simply a 
> reorganization.
> Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18662) Move cluster managers into their own sub-directory

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710391#comment-15710391
 ] 

Apache Spark commented on SPARK-18662:
--

User 'foxish' has created a pull request for this issue:
https://github.com/apache/spark/pull/16092

> Move cluster managers into their own sub-directory
> --
>
> Key: SPARK-18662
> URL: https://issues.apache.org/jira/browse/SPARK-18662
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> As we move to support Kubernetes in addition to Yarn and Mesos 
> (https://issues.apache.org/jira/browse/SPARK-18278), we should move all the 
> cluster managers into a "resource-managers/" sub-directory. This is simply a 
> reorganization.
> Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18662) Move cluster managers into their own sub-directory

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18662:


Assignee: (was: Apache Spark)

> Move cluster managers into their own sub-directory
> --
>
> Key: SPARK-18662
> URL: https://issues.apache.org/jira/browse/SPARK-18662
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Reporter: Anirudh Ramanathan
>Priority: Minor
>
> As we move to support Kubernetes in addition to Yarn and Mesos 
> (https://issues.apache.org/jira/browse/SPARK-18278), we should move all the 
> cluster managers into a "resource-managers/" sub-directory. This is simply a 
> reorganization.
> Ref: https://github.com/apache/spark/pull/16061#issuecomment-263649340



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18663) Simplify CountMinSketch aggregate implementation

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18663:


Assignee: Reynold Xin  (was: Apache Spark)

> Simplify CountMinSketch aggregate implementation
> 
>
> Key: SPARK-18663
> URL: https://issues.apache.org/jira/browse/SPARK-18663
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SPARK-18429 introduced count-min sketch aggregate function for SQL, but the 
> implementation and testing is more complicated than needed. This simplifies 
> the test cases and removes support for data types that don't have clear 
> equality semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18663) Simplify CountMinSketch aggregate implementation

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710616#comment-15710616
 ] 

Apache Spark commented on SPARK-18663:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/16093

> Simplify CountMinSketch aggregate implementation
> 
>
> Key: SPARK-18663
> URL: https://issues.apache.org/jira/browse/SPARK-18663
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SPARK-18429 introduced count-min sketch aggregate function for SQL, but the 
> implementation and testing is more complicated than needed. This simplifies 
> the test cases and removes support for data types that don't have clear 
> equality semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18663) Simplify CountMinSketch aggregate implementation

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18663:


Assignee: Apache Spark  (was: Reynold Xin)

> Simplify CountMinSketch aggregate implementation
> 
>
> Key: SPARK-18663
> URL: https://issues.apache.org/jira/browse/SPARK-18663
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> SPARK-18429 introduced count-min sketch aggregate function for SQL, but the 
> implementation and testing is more complicated than needed. This simplifies 
> the test cases and removes support for data types that don't have clear 
> equality semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18541:


Assignee: Apache Spark

> Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management 
> in pyspark SQL API
> 
>
> Key: SPARK-18541
> URL: https://issues.apache.org/jira/browse/SPARK-18541
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
> Environment: all
>Reporter: Shea Parkes
>Assignee: Apache Spark
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the Scala SQL API, you can pass in new metadata when you alias a field.  
> That functionality is not available in the Python API.   Right now, you have 
> to painfully utilize {{SparkSession.createDataFrame}} to manipulate the 
> metadata for even a single column.  I would propose to add the following 
> method to {{pyspark.sql.Column}}:
> {code}
> def aliasWithMetadata(self, name, metadata):
> """
> Make a new Column that has the provided alias and metadata.
> Metadata will be processed with json.dumps()
> """
> _context = pyspark.SparkContext._active_spark_context
> _metadata_str = json.dumps(metadata)
> _metadata_jvm = 
> _context._jvm.org.apache.spark.sql.types.Metadata.fromJson(_metadata_str)
> _new_java_column = getattr(self._jc, 'as')(name, _metadata_jvm)
> return Column(_new_java_column)
> {code}
> I can likely complete this request myself if there is any interest for it.  
> Just have to dust off my knowledge of doctest and the location of the python 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710705#comment-15710705
 ] 

Apache Spark commented on SPARK-18541:
--

User 'shea-parkes' has created a pull request for this issue:
https://github.com/apache/spark/pull/16094

> Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management 
> in pyspark SQL API
> 
>
> Key: SPARK-18541
> URL: https://issues.apache.org/jira/browse/SPARK-18541
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
> Environment: all
>Reporter: Shea Parkes
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the Scala SQL API, you can pass in new metadata when you alias a field.  
> That functionality is not available in the Python API.   Right now, you have 
> to painfully utilize {{SparkSession.createDataFrame}} to manipulate the 
> metadata for even a single column.  I would propose to add the following 
> method to {{pyspark.sql.Column}}:
> {code}
> def aliasWithMetadata(self, name, metadata):
> """
> Make a new Column that has the provided alias and metadata.
> Metadata will be processed with json.dumps()
> """
> _context = pyspark.SparkContext._active_spark_context
> _metadata_str = json.dumps(metadata)
> _metadata_jvm = 
> _context._jvm.org.apache.spark.sql.types.Metadata.fromJson(_metadata_str)
> _new_java_column = getattr(self._jc, 'as')(name, _metadata_jvm)
> return Column(_new_java_column)
> {code}
> I can likely complete this request myself if there is any interest for it.  
> Just have to dust off my knowledge of doctest and the location of the python 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18541) Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management in pyspark SQL API

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18541:


Assignee: (was: Apache Spark)

> Add pyspark.sql.Column.aliasWithMetadata to allow dynamic metadata management 
> in pyspark SQL API
> 
>
> Key: SPARK-18541
> URL: https://issues.apache.org/jira/browse/SPARK-18541
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.2
> Environment: all
>Reporter: Shea Parkes
>Priority: Minor
>  Labels: newbie
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In the Scala SQL API, you can pass in new metadata when you alias a field.  
> That functionality is not available in the Python API.   Right now, you have 
> to painfully utilize {{SparkSession.createDataFrame}} to manipulate the 
> metadata for even a single column.  I would propose to add the following 
> method to {{pyspark.sql.Column}}:
> {code}
> def aliasWithMetadata(self, name, metadata):
> """
> Make a new Column that has the provided alias and metadata.
> Metadata will be processed with json.dumps()
> """
> _context = pyspark.SparkContext._active_spark_context
> _metadata_str = json.dumps(metadata)
> _metadata_jvm = 
> _context._jvm.org.apache.spark.sql.types.Metadata.fromJson(_metadata_str)
> _new_java_column = getattr(self._jc, 'as')(name, _metadata_jvm)
> return Column(_new_java_column)
> {code}
> I can likely complete this request myself if there is any interest for it.  
> Just have to dust off my knowledge of doctest and the location of the python 
> tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18666) Remove the codes checking deprecated config spark.sql.unsafe.enabled

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18666:


Assignee: Apache Spark

> Remove the codes checking deprecated config spark.sql.unsafe.enabled
> 
>
> Key: SPARK-18666
> URL: https://issues.apache.org/jira/browse/SPARK-18666
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Minor
>
> spark.sql.unsafe.enabled is deprecated since 2.0. There still are codes in 
> Web UI to check it. We should remove it and clean the codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18666) Remove the codes checking deprecated config spark.sql.unsafe.enabled

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710891#comment-15710891
 ] 

Apache Spark commented on SPARK-18666:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/16095

> Remove the codes checking deprecated config spark.sql.unsafe.enabled
> 
>
> Key: SPARK-18666
> URL: https://issues.apache.org/jira/browse/SPARK-18666
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> spark.sql.unsafe.enabled is deprecated since 2.0. There still are codes in 
> Web UI to check it. We should remove it and clean the codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18666) Remove the codes checking deprecated config spark.sql.unsafe.enabled

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18666:


Assignee: (was: Apache Spark)

> Remove the codes checking deprecated config spark.sql.unsafe.enabled
> 
>
> Key: SPARK-18666
> URL: https://issues.apache.org/jira/browse/SPARK-18666
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> spark.sql.unsafe.enabled is deprecated since 2.0. There still are codes in 
> Web UI to check it. We should remove it and clean the codes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18617) Close "kryo auto pick" feature for Spark Streaming

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15710928#comment-15710928
 ] 

Apache Spark commented on SPARK-18617:
--

User 'uncleGen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16096

> Close "kryo auto pick" feature for Spark Streaming
> --
>
> Key: SPARK-18617
> URL: https://issues.apache.org/jira/browse/SPARK-18617
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 2.1.0
>
>
> [PR-15992| https://github.com/apache/spark/pull/15992] provided a solution to 
> fix the bug, i.e. {{receiver data can not be deserialized properly}}. As 
> [~zsxwing] said, it is a critical bug, but we should not break APIs between 
> maintenance releases. It may be a rational choice to close {{auto pick kryo 
> serializer}} for Spark Streaming in the first step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18665) Spark ThriftServer jobs where are canceled are still “STARTED”

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18665:


Assignee: (was: Apache Spark)

> Spark ThriftServer jobs where are canceled are still “STARTED”
> --
>
> Key: SPARK-18665
> URL: https://issues.apache.org/jira/browse/SPARK-18665
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3
>Reporter: cen yuhai
> Attachments: 1179ACF7-3E62-44C5-B01D-CA71C876ECCE.png, 
> 83C5E8AD-59DE-4A85-A483-2BE3FB83F378.png
>
>
> I find that, some jobs are canceled, but the state are still "STARTED", I 
> think this bug are imported by SPARK-6964



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18665) Spark ThriftServer jobs where are canceled are still “STARTED”

2016-11-30 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18665:


Assignee: Apache Spark

> Spark ThriftServer jobs where are canceled are still “STARTED”
> --
>
> Key: SPARK-18665
> URL: https://issues.apache.org/jira/browse/SPARK-18665
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3
>Reporter: cen yuhai
>Assignee: Apache Spark
> Attachments: 1179ACF7-3E62-44C5-B01D-CA71C876ECCE.png, 
> 83C5E8AD-59DE-4A85-A483-2BE3FB83F378.png
>
>
> I find that, some jobs are canceled, but the state are still "STARTED", I 
> think this bug are imported by SPARK-6964



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18665) Spark ThriftServer jobs where are canceled are still “STARTED”

2016-11-30 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711100#comment-15711100
 ] 

Apache Spark commented on SPARK-18665:
--

User 'cenyuhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/16097

> Spark ThriftServer jobs where are canceled are still “STARTED”
> --
>
> Key: SPARK-18665
> URL: https://issues.apache.org/jira/browse/SPARK-18665
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3
>Reporter: cen yuhai
> Attachments: 1179ACF7-3E62-44C5-B01D-CA71C876ECCE.png, 
> 83C5E8AD-59DE-4A85-A483-2BE3FB83F378.png
>
>
> I find that, some jobs are canceled, but the state are still "STARTED", I 
> think this bug are imported by SPARK-6964



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18672) Close recordwriter in SparkHadoopMapReduceWriter before committing

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711505#comment-15711505
 ] 

Apache Spark commented on SPARK-18672:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16098

> Close recordwriter in SparkHadoopMapReduceWriter before committing
> --
>
> Key: SPARK-18672
> URL: https://issues.apache.org/jira/browse/SPARK-18672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Hyukjin Kwon
>
> It seems some APIs such as {{PairRDDFunctions.saveAsHadoopDataset()}} do not 
> close the record writer before issuing the commit for the task.
> On Windows, the output in the temp directory is being open and output 
> committer tries to rename it from temp directory to the output directory 
> after finishing writing. 
> So, it fails to move the file. It seems we should close the writer actually 
> before committing the task like the other writers such as 
> {{FileFormatWriter}}.
> Identified failure was as below:
> {code}
> FAILURE! - in org.apache.spark.JavaAPISuite
> writeWithNewAPIHadoopFile(org.apache.spark.JavaAPISuite)  Time elapsed: 0.25 
> sec  <<< ERROR!
> org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor 
> driver): org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:182)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:100)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:99)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:167)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:156)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:168)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: Task failed while writing rows
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18672) Close recordwriter in SparkHadoopMapReduceWriter before committing

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18672:


Assignee: Apache Spark

> Close recordwriter in SparkHadoopMapReduceWriter before committing
> --
>
> Key: SPARK-18672
> URL: https://issues.apache.org/jira/browse/SPARK-18672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>
> It seems some APIs such as {{PairRDDFunctions.saveAsHadoopDataset()}} do not 
> close the record writer before issuing the commit for the task.
> On Windows, the output in the temp directory is being open and output 
> committer tries to rename it from temp directory to the output directory 
> after finishing writing. 
> So, it fails to move the file. It seems we should close the writer actually 
> before committing the task like the other writers such as 
> {{FileFormatWriter}}.
> Identified failure was as below:
> {code}
> FAILURE! - in org.apache.spark.JavaAPISuite
> writeWithNewAPIHadoopFile(org.apache.spark.JavaAPISuite)  Time elapsed: 0.25 
> sec  <<< ERROR!
> org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor 
> driver): org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:182)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:100)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:99)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:167)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:156)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:168)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: Task failed while writing rows
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18672) Close recordwriter in SparkHadoopMapReduceWriter before committing

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18672:


Assignee: (was: Apache Spark)

> Close recordwriter in SparkHadoopMapReduceWriter before committing
> --
>
> Key: SPARK-18672
> URL: https://issues.apache.org/jira/browse/SPARK-18672
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Hyukjin Kwon
>
> It seems some APIs such as {{PairRDDFunctions.saveAsHadoopDataset()}} do not 
> close the record writer before issuing the commit for the task.
> On Windows, the output in the temp directory is being open and output 
> committer tries to rename it from temp directory to the output directory 
> after finishing writing. 
> So, it fails to move the file. It seems we should close the writer actually 
> before committing the task like the other writers such as 
> {{FileFormatWriter}}.
> Identified failure was as below:
> {code}
> FAILURE! - in org.apache.spark.JavaAPISuite
> writeWithNewAPIHadoopFile(org.apache.spark.JavaAPISuite)  Time elapsed: 0.25 
> sec  <<< ERROR!
> org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: 
> Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor 
> driver): org.apache.spark.SparkException: Task failed while writing rows
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:182)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:100)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$3.apply(SparkHadoopMapReduceWriter.scala:99)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:436)
>   at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:415)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
>   at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
>   at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:153)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:167)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$$anonfun$4.apply(SparkHadoopMapReduceWriter.scala:156)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1341)
>   at 
> org.apache.spark.internal.io.SparkHadoopMapReduceWriter$.org$apache$spark$internal$io$SparkHadoopMapReduceWriter$$executeTask(SparkHadoopMapReduceWriter.scala:168)
>   ... 8 more
> Driver stacktrace:
>   at 
> org.apache.spark.JavaAPISuite.writeWithNewAPIHadoopFile(JavaAPISuite.java:1231)
> Caused by: org.apache.spark.SparkException: Task failed while writing rows
> Caused by: java.io.IOException: Could not rename 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/_temporary/attempt_20161201005155__r_00_0
>  to 
> file:/C:/projects/spark/core/target/tmp/1480553515529-0/output/_temporary/0/task_20161201005155__r_00
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18665) Spark ThriftServer jobs where are canceled are still “STARTED”

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15711830#comment-15711830
 ] 

Apache Spark commented on SPARK-18665:
--

User 'cenyuhai' has created a pull request for this issue:
https://github.com/apache/spark/pull/16099

> Spark ThriftServer jobs where are canceled are still “STARTED”
> --
>
> Key: SPARK-18665
> URL: https://issues.apache.org/jira/browse/SPARK-18665
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3
>Reporter: cen yuhai
> Attachments: 1179ACF7-3E62-44C5-B01D-CA71C876ECCE.png, 
> 83C5E8AD-59DE-4A85-A483-2BE3FB83F378.png
>
>
> I find that, some jobs are canceled, but the state are still "STARTED", I 
> think this bug are imported by SPARK-6964



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18674) improve the error message of natural join

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18674:


Assignee: Wenchen Fan  (was: Apache Spark)

> improve the error message of natural join
> -
>
> Key: SPARK-18674
> URL: https://issues.apache.org/jira/browse/SPARK-18674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18674) improve the error message of natural join

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18674:


Assignee: Apache Spark  (was: Wenchen Fan)

> improve the error message of natural join
> -
>
> Key: SPARK-18674
> URL: https://issues.apache.org/jira/browse/SPARK-18674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18674) improve the error message of natural join

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712130#comment-15712130
 ] 

Apache Spark commented on SPARK-18674:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16100

> improve the error message of natural join
> -
>
> Key: SPARK-18674
> URL: https://issues.apache.org/jira/browse/SPARK-18674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18674) improve the error message of natural join

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18674:


Assignee: Wenchen Fan  (was: Apache Spark)

> improve the error message of natural join
> -
>
> Key: SPARK-18674
> URL: https://issues.apache.org/jira/browse/SPARK-18674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18674) improve the error message of natural join

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18674:


Assignee: Apache Spark  (was: Wenchen Fan)

> improve the error message of natural join
> -
>
> Key: SPARK-18674
> URL: https://issues.apache.org/jira/browse/SPARK-18674
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18586) netty-3.8.0.Final.jar has vulnerability CVE-2014-3488 and CVE-2014-0193

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18586:


Assignee: (was: Apache Spark)

> netty-3.8.0.Final.jar has vulnerability CVE-2014-3488  and CVE-2014-0193
> 
>
> Key: SPARK-18586
> URL: https://issues.apache.org/jira/browse/SPARK-18586
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18586) netty-3.8.0.Final.jar has vulnerability CVE-2014-3488 and CVE-2014-0193

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712341#comment-15712341
 ] 

Apache Spark commented on SPARK-18586:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16102

> netty-3.8.0.Final.jar has vulnerability CVE-2014-3488  and CVE-2014-0193
> 
>
> Key: SPARK-18586
> URL: https://issues.apache.org/jira/browse/SPARK-18586
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18586) netty-3.8.0.Final.jar has vulnerability CVE-2014-3488 and CVE-2014-0193

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18586:


Assignee: Apache Spark

> netty-3.8.0.Final.jar has vulnerability CVE-2014-3488  and CVE-2014-0193
> 
>
> Key: SPARK-18586
> URL: https://issues.apache.org/jira/browse/SPARK-18586
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18475:


Assignee: (was: Apache Spark)

> Be able to provide higher parallelization for StructuredStreaming Kafka Source
> --
>
> Key: SPARK-18475
> URL: https://issues.apache.org/jira/browse/SPARK-18475
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Burak Yavuz
>
> Right now the StructuredStreaming Kafka Source creates as many Spark tasks as 
> there are TopicPartitions that we're going to read from Kafka.
> This doesn't work well when we have data skew, and there is no reason why we 
> shouldn't be able to increase parallelism further, i.e. have multiple Spark 
> tasks reading from the same Kafka TopicPartition.
> What this will mean is that we won't be able to use the "CachedKafkaConsumer" 
> for what it is defined for (being cached) in this use case, but the extra 
> overhead is worth handling data skew and increasing parallelism especially in 
> ETL use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18475) Be able to provide higher parallelization for StructuredStreaming Kafka Source

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18475:


Assignee: Apache Spark

> Be able to provide higher parallelization for StructuredStreaming Kafka Source
> --
>
> Key: SPARK-18475
> URL: https://issues.apache.org/jira/browse/SPARK-18475
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> Right now the StructuredStreaming Kafka Source creates as many Spark tasks as 
> there are TopicPartitions that we're going to read from Kafka.
> This doesn't work well when we have data skew, and there is no reason why we 
> shouldn't be able to increase parallelism further, i.e. have multiple Spark 
> tasks reading from the same Kafka TopicPartition.
> What this will mean is that we won't be able to use the "CachedKafkaConsumer" 
> for what it is defined for (being cached) in this use case, but the extra 
> overhead is worth handling data skew and increasing parallelism especially in 
> ETL use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712533#comment-15712533
 ] 

Apache Spark commented on SPARK-18374:
--

User 'hhbyyh' has created a pull request for this issue:
https://github.com/apache/spark/pull/16103

> Incorrect words in StopWords/english.txt
> 
>
> Key: SPARK-18374
> URL: https://issues.apache.org/jira/browse/SPARK-18374
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.1
>Reporter: nirav patel
>
> I was just double checking english.txt for list of stopwords as I felt it was 
> taking out valid tokens like 'won'. I think issue is english.txt list is 
> missing apostrophe character and all character after apostrophe. So "won't" 
> becam "won" in that list; "wouldn't" is "wouldn" .
> Here are some incorrect tokens in this list:
> won
> wouldn
> ma
> mightn
> mustn
> needn
> shan
> shouldn
> wasn
> weren
> I think ideal list should have both style. i.e. won't and wont both should be 
> part of english.txt as some tokenizer might remove special characters. But 
> 'won' is obviously shouldn't be in this list.
> Here's list of snowball english stop words:
> http://snowball.tartarus.org/algorithms/english/stop.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18374:


Assignee: (was: Apache Spark)

> Incorrect words in StopWords/english.txt
> 
>
> Key: SPARK-18374
> URL: https://issues.apache.org/jira/browse/SPARK-18374
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.1
>Reporter: nirav patel
>
> I was just double checking english.txt for list of stopwords as I felt it was 
> taking out valid tokens like 'won'. I think issue is english.txt list is 
> missing apostrophe character and all character after apostrophe. So "won't" 
> becam "won" in that list; "wouldn't" is "wouldn" .
> Here are some incorrect tokens in this list:
> won
> wouldn
> ma
> mightn
> mustn
> needn
> shan
> shouldn
> wasn
> weren
> I think ideal list should have both style. i.e. won't and wont both should be 
> part of english.txt as some tokenizer might remove special characters. But 
> 'won' is obviously shouldn't be in this list.
> Here's list of snowball english stop words:
> http://snowball.tartarus.org/algorithms/english/stop.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18374:


Assignee: Apache Spark

> Incorrect words in StopWords/english.txt
> 
>
> Key: SPARK-18374
> URL: https://issues.apache.org/jira/browse/SPARK-18374
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.1
>Reporter: nirav patel
>Assignee: Apache Spark
>
> I was just double checking english.txt for list of stopwords as I felt it was 
> taking out valid tokens like 'won'. I think issue is english.txt list is 
> missing apostrophe character and all character after apostrophe. So "won't" 
> becam "won" in that list; "wouldn't" is "wouldn" .
> Here are some incorrect tokens in this list:
> won
> wouldn
> ma
> mightn
> mustn
> needn
> shan
> shouldn
> wasn
> weren
> I think ideal list should have both style. i.e. won't and wont both should be 
> part of english.txt as some tokenizer might remove special characters. But 
> 'won' is obviously shouldn't be in this list.
> Here's list of snowball english stop words:
> http://snowball.tartarus.org/algorithms/english/stop.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712559#comment-15712559
 ] 

Apache Spark commented on SPARK-18675:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16104

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18675:


Assignee: Wenchen Fan  (was: Apache Spark)

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18675:


Assignee: Apache Spark  (was: Wenchen Fan)

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18560) Receiver data can not be dataSerialized properly.

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712837#comment-15712837
 ] 

Apache Spark commented on SPARK-18560:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16105

> Receiver data can not be dataSerialized properly.
> -
>
> Key: SPARK-18560
> URL: https://issues.apache.org/jira/browse/SPARK-18560
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
>Reporter: Genmao Yu
>Priority: Critical
>
> My spark streaming job can run correctly on Spark 1.6.1, but it can not run 
> properly on Spark 2.0.1, with following exception:
> {code}
> 16/11/22 19:20:15 ERROR executor.Executor: Exception in task 4.3 in stage 6.0 
> (TID 87)
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
>   at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
>   at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
>   at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
>   at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:243)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>   at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1760)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1150)
>   at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1150)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1943)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1943)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Go deep into  relevant implementation, I find the type of data received by 
> {{Receiver}} is erased. And in Spark2.x, framework can choose a appropriate 
> {{Serializer}} from {{JavaSerializer}} and {{KryoSerializer}} base on the 
> type of data. 
> At the {{Receiver}} side, the type of data is erased to be {{Object}}, so 
> framework will choose {{JavaSerializer}}, with following code:
> {code}
> def canUseKryo(ct: ClassTag[_]): Boolean = {
> primitiveAndPrimitiveArrayClassTags.contains(ct) || ct == stringClassTag
>   }
>   def getSerializer(ct: ClassTag[_]): Serializer = {
> if (canUseKryo(ct)) {
>   kryoSerializer
> } else {
>   defaultSerializer
> }
>   }
> {code}
> At task side, we can get correct data type, and framework will choose 
> {{KryoSerializer}} if possible, with following supported type:
> {code}
> private[this] val stringClassTag: ClassTag[String] = 
> implicitly[ClassTag[String]]
> private[this] val primitiveAndPrimitiveArrayClassTags: Set[ClassTag[_]] = {
> val primitiveClassTags = Set[ClassTag[_]](
>   ClassTag.Boolean,
>   ClassTag.Byte,
>   ClassTag.Char,
>   ClassTag.Double,
>   ClassTag.Float,
>   ClassTag.Int,
>   ClassTag.Long,
>   ClassTag.Null,
>   ClassTag.Short
> )
> val arrayClassTags = primitiveClassTags.map(_.wrap)
> primitiveClassTags ++ arrayClassTags
>   }
> {code}
> In my case, the type of data is Byte Array.
> This problem stems from SPARK-13990, a patch to have Spark automatically pick 
> the "best" serializer when caching RDDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18617) Close "kryo auto pick" feature for Spark Streaming

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712835#comment-15712835
 ] 

Apache Spark commented on SPARK-18617:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16105

> Close "kryo auto pick" feature for Spark Streaming
> --
>
> Key: SPARK-18617
> URL: https://issues.apache.org/jira/browse/SPARK-18617
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.2
>Reporter: Genmao Yu
>Assignee: Genmao Yu
> Fix For: 2.0.3, 2.1.0
>
>
> [PR-15992| https://github.com/apache/spark/pull/15992] provided a solution to 
> fix the bug, i.e. {{receiver data can not be deserialized properly}}. As 
> [~zsxwing] said, it is a critical bug, but we should not break APIs between 
> maintenance releases. It may be a rational choice to close {{auto pick kryo 
> serializer}} for Spark Streaming in the first step.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17213:


Assignee: Apache Spark  (was: Cheng Lian)

> Parquet String Pushdown for Non-Eq Comparisons Broken
> -
>
> Key: SPARK-17213
> URL: https://issues.apache.org/jira/browse/SPARK-17213
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Andrew Duffy
>Assignee: Apache Spark
>
> Spark defines ordering over strings based on comparison of UTF8 byte arrays, 
> which compare bytes as unsigned integers. Currently however Parquet does not 
> respect this ordering. This is currently in the process of being fixed in 
> Parquet, JIRA and PR link below, but currently all filters are broken over 
> strings, with there actually being a correctness issue for {{>}} and {{<}}.
> *Repro:*
> Querying directly from in-memory DataFrame:
> {code}
> > Seq("a", "é").toDF("name").where("name > 'a'").count
> 1
> {code}
> Querying from a parquet dataset:
> {code}
> > Seq("a", "é").toDF("name").write.parquet("/tmp/bad")
> > spark.read.parquet("/tmp/bad").where("name > 'a'").count
> 0
> {code}
> This happens because Spark sorts the rows to be {{[a, é]}}, but Parquet's 
> implementation of comparison of strings is based on signed byte array 
> comparison, so it will actually create 1 row group with statistics 
> {{min=é,max=a}}, and so the row group will be dropped by the query.
> Based on the way Parquet pushes down Eq, it will not be affecting correctness 
> but it will force you to read row groups you should be able to skip.
> Link to PARQUET issue: https://issues.apache.org/jira/browse/PARQUET-686
> Link to PR: https://github.com/apache/parquet-mr/pull/362



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712921#comment-15712921
 ] 

Apache Spark commented on SPARK-17213:
--

User 'liancheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/16106

> Parquet String Pushdown for Non-Eq Comparisons Broken
> -
>
> Key: SPARK-17213
> URL: https://issues.apache.org/jira/browse/SPARK-17213
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Andrew Duffy
>Assignee: Cheng Lian
>
> Spark defines ordering over strings based on comparison of UTF8 byte arrays, 
> which compare bytes as unsigned integers. Currently however Parquet does not 
> respect this ordering. This is currently in the process of being fixed in 
> Parquet, JIRA and PR link below, but currently all filters are broken over 
> strings, with there actually being a correctness issue for {{>}} and {{<}}.
> *Repro:*
> Querying directly from in-memory DataFrame:
> {code}
> > Seq("a", "é").toDF("name").where("name > 'a'").count
> 1
> {code}
> Querying from a parquet dataset:
> {code}
> > Seq("a", "é").toDF("name").write.parquet("/tmp/bad")
> > spark.read.parquet("/tmp/bad").where("name > 'a'").count
> 0
> {code}
> This happens because Spark sorts the rows to be {{[a, é]}}, but Parquet's 
> implementation of comparison of strings is based on signed byte array 
> comparison, so it will actually create 1 row group with statistics 
> {{min=é,max=a}}, and so the row group will be dropped by the query.
> Based on the way Parquet pushes down Eq, it will not be affecting correctness 
> but it will force you to read row groups you should be able to skip.
> Link to PARQUET issue: https://issues.apache.org/jira/browse/PARQUET-686
> Link to PR: https://github.com/apache/parquet-mr/pull/362



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-17213) Parquet String Pushdown for Non-Eq Comparisons Broken

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17213:


Assignee: Cheng Lian  (was: Apache Spark)

> Parquet String Pushdown for Non-Eq Comparisons Broken
> -
>
> Key: SPARK-17213
> URL: https://issues.apache.org/jira/browse/SPARK-17213
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Andrew Duffy
>Assignee: Cheng Lian
>
> Spark defines ordering over strings based on comparison of UTF8 byte arrays, 
> which compare bytes as unsigned integers. Currently however Parquet does not 
> respect this ordering. This is currently in the process of being fixed in 
> Parquet, JIRA and PR link below, but currently all filters are broken over 
> strings, with there actually being a correctness issue for {{>}} and {{<}}.
> *Repro:*
> Querying directly from in-memory DataFrame:
> {code}
> > Seq("a", "é").toDF("name").where("name > 'a'").count
> 1
> {code}
> Querying from a parquet dataset:
> {code}
> > Seq("a", "é").toDF("name").write.parquet("/tmp/bad")
> > spark.read.parquet("/tmp/bad").where("name > 'a'").count
> 0
> {code}
> This happens because Spark sorts the rows to be {{[a, é]}}, but Parquet's 
> implementation of comparison of strings is based on signed byte array 
> comparison, so it will actually create 1 row group with statistics 
> {{min=é,max=a}}, and so the row group will be dropped by the query.
> Based on the way Parquet pushes down Eq, it will not be affecting correctness 
> but it will force you to read row groups you should be able to skip.
> Link to PARQUET issue: https://issues.apache.org/jira/browse/PARQUET-686
> Link to PR: https://github.com/apache/parquet-mr/pull/362



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18677) Json path implementation fails to parse ['key']

2016-12-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15712996#comment-15712996
 ] 

Apache Spark commented on SPARK-18677:
--

User 'rdblue' has created a pull request for this issue:
https://github.com/apache/spark/pull/16107

> Json path implementation fails to parse ['key']
> ---
>
> Key: SPARK-18677
> URL: https://issues.apache.org/jira/browse/SPARK-18677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Ryan Blue
>
> The current json path parser fails to parse expressions like ['key'], which 
> are used for named expressions with spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18677) Json path implementation fails to parse ['key']

2016-12-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18677:


Assignee: Apache Spark

> Json path implementation fails to parse ['key']
> ---
>
> Key: SPARK-18677
> URL: https://issues.apache.org/jira/browse/SPARK-18677
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Ryan Blue
>Assignee: Apache Spark
>
> The current json path parser fails to parse expressions like ['key'], which 
> are used for named expressions with spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 7 8 9 10 11 12 13 >

1101 - 1200 of 1201 matches

Mail list logo