[jira] [Commented] (SPARK-6305) Add support for log4j 2.x to Spark

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400993#comment-15400993
 ] 

Reynold Xin commented on SPARK-6305:


[~srowen] looked into this in the past and he didn't get everything working. 
Sean -can you share more?


> Add support for log4j 2.x to Spark
> --
>
> Key: SPARK-6305
> URL: https://issues.apache.org/jira/browse/SPARK-6305
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Tal Sliwowicz
>Priority: Minor
>
> log4j 2 requires replacing the slf4j binding and adding the log4j jars in the 
> classpath. Since there are shaded jars, it must be done during the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16812) Open up SparkILoop.getAddedJars

2016-07-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16812.
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

> Open up SparkILoop.getAddedJars
> ---
>
> Key: SPARK-16812
> URL: https://issues.apache.org/jira/browse/SPARK-16812
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.1, 2.1.0
>
>
> SparkILoop.getAddedJars is a useful method to use so we can programmatically 
> get the list of jars added.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400981#comment-15400981
 ] 

Apache Spark commented on SPARK-16818:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/14427

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Critical
> Fix For: 2.1.0
>
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400966#comment-15400966
 ] 

Reynold Xin commented on SPARK-16818:
-

I've merged this in master, but this still needs to be merged into branch-2.0.


> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Critical
> Fix For: 2.1.0
>
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16818:

Fix Version/s: (was: 2.0.1)

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Critical
> Fix For: 2.1.0
>
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16818.
-
   Resolution: Fixed
 Assignee: Eric Liang
Fix Version/s: 2.1.0
   2.0.1

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Assignee: Eric Liang
>Priority: Critical
> Fix For: 2.0.1, 2.1.0
>
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400942#comment-15400942
 ] 

Xiao Li commented on SPARK-16275:
-

It sounds like both of you are fine to remove the Hive's hash UDF. Will submit 
a PR to resolve it. 

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400924#comment-15400924
 ] 

Wenchen Fan commented on SPARK-16275:
-

It's weird if we have 2 hash implementations, one for hive compatibility, one 
for internal usage(shuffle, bucket, etc.)  I'd like to update those values with 
our own hash function.

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16819) Exception in thread “main” org.apache.spark.SparkException: Application application finished with failed status

2016-07-30 Thread Asmaa Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asmaa Ali  updated SPARK-16819:
---
Description: 
What is the reason of this exception ?!

cancerdetector@cluster-cancerdetector-m:~/SparkBWA/build$ spark-submit --class 
SparkBWA --master yarn-cluster --deploy-mode cluster --conf 
spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar --driver-memory 1500m 
--executor-memory 1500m --executor-cores 1 --archives ./bwa.zip --verbose 
./SparkBWA.jar -algorithm mem -reads paired -index /Data/HumanBase/hg38 
-partitions 32 ERR000589_1.filt.fastq ERR000589_2.filt.fastqhb Output_ERR000589
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: 
spark.executor.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: 
spark.history.fs.logDirectory=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.maxResultSize=1920m
Adding default property: spark.shuffle.service.enabled=true
Adding default property: 
spark.yarn.historyServer.address=cluster-cancerdetector-m:18080
Adding default property: spark.sql.parquet.cacheMetadata=false
Adding default property: spark.driver.memory=3840m
Adding default property: spark.dynamicAllocation.maxExecutors=1
Adding default property: spark.scheduler.minRegisteredResourcesRatio=0.0
Adding default property: spark.yarn.am.memoryOverhead=558
Adding default property: spark.yarn.am.memory=5586m
Adding default property: 
spark.driver.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: spark.master=yarn-client
Adding default property: spark.executor.memory=5586m
Adding default property: 
spark.eventLog.dir=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.cores=2
Adding default property: spark.yarn.executor.memoryOverhead=558
Adding default property: spark.dynamicAllocation.minExecutors=1
Adding default property: spark.dynamicAllocation.initialExecutors=1
Adding default property: spark.akka.frameSize=512
Parsed arguments:
  master  yarn-cluster
  deployMode  cluster
  executorMemory  1500m
  executorCores   1
  totalExecutorCores  null
  propertiesFile  /usr/lib/spark/conf/spark-defaults.conf
  driverMemory1500m
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesfile:/home/cancerdetector/SparkBWA/build/./bwa.zip
  mainClass   SparkBWA
  primaryResource 
file:/home/cancerdetector/SparkBWA/build/./SparkBWA.jar
  nameSparkBWA
  childArgs   [-algorithm mem -reads paired -index 
/Data/HumanBase/hg38 -partitions 32 ERR000589_1.filt.fastq 
ERR000589_2.filt.fastqhb Output_ERR000589]
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file 
/usr/lib/spark/conf/spark-defaults.conf:
  spark.yarn.am.memoryOverhead -> 558
  spark.driver.memory -> 1500m
  spark.yarn.jar -> hdfs:///user/spark/spark-assembly.jar
  spark.executor.memory -> 5586m
  spark.yarn.historyServer.address -> cluster-cancerdetector-m:18080
  spark.eventLog.enabled -> true
  spark.scheduler.minRegisteredResourcesRatio -> 0.0
  spark.dynamicAllocation.maxExecutors -> 1
  spark.akka.frameSize -> 512
  spark.executor.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.sql.parquet.cacheMetadata -> false
  spark.shuffle.service.enabled -> true
  spark.history.fs.logDirectory -> 
hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.dynamicAllocation.initialExecutors -> 1
  spark.dynamicAllocation.minExecutors -> 1
  spark.yarn.executor.memoryOverhead -> 558
  spark.driver.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.eventLog.dir -> hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.yarn.am.memory -> 5586m
  spark.driver.maxResultSize -> 1920m
  spark.master -> yarn-client
  spark.dynamicAllocation.enabled -> true
  spark.executor.cores -> 2


Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
SparkBWA
--driver-memory
1500m
--executor-memory
1500m
--executor-cores
1
--archives
file:/home/can

[jira] [Updated] (SPARK-16819) Exception in thread “main” org.apache.spark.SparkException: Application application finished with failed status

2016-07-30 Thread Asmaa Ali (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Asmaa Ali  updated SPARK-16819:
---
Description: 
What is the reason of this exception ?!

cancerdetector@cluster-cancerdetector-m:~/SparkBWA/build$ spark-submit --class 
SparkBWA --master yarn-cluster --deploy-mode cluster --conf 
spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar --driver-memory 1500m 
--executor-memory 1500m --executor-cores 1 --archives ./bwa.zip --verbose 
./SparkBWA.jar -algorithm mem -reads paired -index /Data/HumanBase/hg38 
-partitions 32 ERR000589_1.filt.fastq ERR000589_2.filt.fastqhb Output_ERR000589
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: 
spark.executor.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: 
spark.history.fs.logDirectory=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.maxResultSize=1920m
Adding default property: spark.shuffle.service.enabled=true
Adding default property: 
spark.yarn.historyServer.address=cluster-cancerdetector-m:18080
Adding default property: spark.sql.parquet.cacheMetadata=false
Adding default property: spark.driver.memory=3840m
Adding default property: spark.dynamicAllocation.maxExecutors=1
Adding default property: spark.scheduler.minRegisteredResourcesRatio=0.0
Adding default property: spark.yarn.am.memoryOverhead=558
Adding default property: spark.yarn.am.memory=5586m
Adding default property: 
spark.driver.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: spark.master=yarn-client
Adding default property: spark.executor.memory=5586m
Adding default property: 
spark.eventLog.dir=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.cores=2
Adding default property: spark.yarn.executor.memoryOverhead=558
Adding default property: spark.dynamicAllocation.minExecutors=1
Adding default property: spark.dynamicAllocation.initialExecutors=1
Adding default property: spark.akka.frameSize=512
Parsed arguments:
  master  yarn-cluster
  deployMode  cluster
  executorMemory  1500m
  executorCores   1
  totalExecutorCores  null
  propertiesFile  /usr/lib/spark/conf/spark-defaults.conf
  driverMemory1500m
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesfile:/home/cancerdetector/SparkBWA/build/./bwa.zip
  mainClass   SparkBWA
  primaryResource 
file:/home/cancerdetector/SparkBWA/build/./SparkBWA.jar
  nameSparkBWA
  childArgs   [-algorithm mem -reads paired -index 
/Data/HumanBase/hg38 -partitions 32 ERR000589_1.filt.fastq 
ERR000589_2.filt.fastqhb Output_ERR000589]
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file 
/usr/lib/spark/conf/spark-defaults.conf:
  spark.yarn.am.memoryOverhead -> 558
  spark.driver.memory -> 1500m
  spark.yarn.jar -> hdfs:///user/spark/spark-assembly.jar
  spark.executor.memory -> 5586m
  spark.yarn.historyServer.address -> cluster-cancerdetector-m:18080
  spark.eventLog.enabled -> true
  spark.scheduler.minRegisteredResourcesRatio -> 0.0
  spark.dynamicAllocation.maxExecutors -> 1
  spark.akka.frameSize -> 512
  spark.executor.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.sql.parquet.cacheMetadata -> false
  spark.shuffle.service.enabled -> true
  spark.history.fs.logDirectory -> 
hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.dynamicAllocation.initialExecutors -> 1
  spark.dynamicAllocation.minExecutors -> 1
  spark.yarn.executor.memoryOverhead -> 558
  spark.driver.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.eventLog.dir -> hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.yarn.am.memory -> 5586m
  spark.driver.maxResultSize -> 1920m
  spark.master -> yarn-client
  spark.dynamicAllocation.enabled -> true
  spark.executor.cores -> 2


Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--name
SparkBWA
--driver-memory
1500m
--executor-memory
1500m
--executor-cores
1
--archives
file:/home/can

[jira] [Created] (SPARK-16819) Exception in thread “main” org.apache.spark.SparkException: Application application finished with failed status

2016-07-30 Thread Asmaa Ali (JIRA)
Asmaa Ali  created SPARK-16819:
--

 Summary: Exception in thread “main” 
org.apache.spark.SparkException: Application application finished with failed 
status
 Key: SPARK-16819
 URL: https://issues.apache.org/jira/browse/SPARK-16819
 Project: Spark
  Issue Type: Question
  Components: Streaming, YARN
Reporter: Asmaa Ali 


What is the reason of this exception ?!


cancerdetector@cluster-cancerdetector-m:~/SparkBWA/build$ spark-submit --class 
SparkBWA --master yarn-cluster --
deploy-mode cluster --conf spark.yarn.jar=hdfs:///user/spark/spark-assembly.jar 
--driver-memory 1500m --executor-memory 1500m --executor-cores 1 --archives 
./bwa.zip --verbose ./SparkBWA.jar -algorithm mem -reads paired -index 
/Data/HumanBase/hg38 -partitions 32 ERR000589_1.filt.fastq 
ERR000589_2.filt.fastqhb Output_ERR000589-> added --deploy-mode cluster
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: 
spark.executor.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: 
spark.history.fs.logDirectory=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.driver.maxResultSize=1920m
Adding default property: spark.shuffle.service.enabled=true
Adding default property: 
spark.yarn.historyServer.address=cluster-cancerdetector-m:18080
Adding default property: spark.sql.parquet.cacheMetadata=false
Adding default property: spark.driver.memory=3840m
Adding default property: spark.dynamicAllocation.maxExecutors=1
Adding default property: spark.scheduler.minRegisteredResourcesRatio=0.0
Adding default property: spark.yarn.am.memoryOverhead=558
Adding default property: spark.yarn.am.memory=5586m
Adding default property: 
spark.driver.extraJavaOptions=-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
Adding default property: spark.master=yarn-client
Adding default property: spark.executor.memory=5586m
Adding default property: 
spark.eventLog.dir=hdfs://cluster-cancerdetector-m/user/spark/eventlog
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.cores=2
Adding default property: spark.yarn.executor.memoryOverhead=558
Adding default property: spark.dynamicAllocation.minExecutors=1
Adding default property: spark.dynamicAllocation.initialExecutors=1
Adding default property: spark.akka.frameSize=512
Parsed arguments:
  master  yarn-cluster
  deployMode  cluster
  executorMemory  1500m
  executorCores   1
  totalExecutorCores  null
  propertiesFile  /usr/lib/spark/conf/spark-defaults.conf
  driverMemory1500m
  driverCores null
  driverExtraClassPathnull
  driverExtraLibraryPath  null
  driverExtraJavaOptions  
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  supervise   false
  queue   null
  numExecutorsnull
  files   null
  pyFiles null
  archivesfile:/home/cancerdetector/SparkBWA/build/./bwa.zip
  mainClass   SparkBWA
  primaryResource 
file:/home/cancerdetector/SparkBWA/build/./SparkBWA.jar
  nameSparkBWA
  childArgs   [-algorithm mem -reads paired -index 
/Data/HumanBase/hg38 -partitions 32 ERR000589_1.filt.fastq 
ERR000589_2.filt.fastqhb Output_ERR000589- --deploy-mode cluster]
  jarsnull
  packagesnull
  packagesExclusions  null
  repositoriesnull
  verbose true

Spark properties used, including those specified through
 --conf and those from the properties file 
/usr/lib/spark/conf/spark-defaults.conf:
  spark.yarn.am.memoryOverhead -> 558
  spark.driver.memory -> 1500m
  spark.yarn.jar -> hdfs:///user/spark/spark-assembly.jar
  spark.executor.memory -> 5586m
  spark.yarn.historyServer.address -> cluster-cancerdetector-m:18080
  spark.eventLog.enabled -> true
  spark.scheduler.minRegisteredResourcesRatio -> 0.0
  spark.dynamicAllocation.maxExecutors -> 1
  spark.akka.frameSize -> 512
  spark.executor.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.sql.parquet.cacheMetadata -> false
  spark.shuffle.service.enabled -> true
  spark.history.fs.logDirectory -> 
hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.dynamicAllocation.initialExecutors -> 1
  spark.dynamicAllocation.minExecutors -> 1
  spark.yarn.executor.memoryOverhead -> 558
  spark.driver.extraJavaOptions -> 
-Xbootclasspath/p:/usr/local/share/google/alpn/alpn-boot-8.1.7.v20160121.jar
  spark.eventLog.dir -> hdfs://cluster-cancerdetector-m/user/spark/eventlog
  spark.yarn.am.memory -> 5586m
  spark.driver.maxResul

[jira] [Commented] (SPARK-16475) Broadcast Hint for SQL Queries

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400881#comment-15400881
 ] 

Apache Spark commented on SPARK-16475:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/14426

> Broadcast Hint for SQL Queries
> --
>
> Key: SPARK-16475
> URL: https://issues.apache.org/jira/browse/SPARK-16475
> Project: Spark
>  Issue Type: Improvement
>Reporter: Reynold Xin
> Attachments: BroadcastHintinSparkSQL.pdf
>
>
> Broadcast hint is a way for users to manually annotate a query and suggest to 
> the query optimizer the join method. It is very useful when the query 
> optimizer cannot make optimal decision with respect to join methods due to 
> conservativeness or the lack of proper statistics.
> The DataFrame API has broadcast hint since Spark 1.5. However, we do not have 
> an equivalent functionality in SQL queries. We propose adding Hive-style 
> broadcast hint to Spark SQL.
> For more information, please see the attached document. One note about the 
> doc: in addition to supporting "MAPJOIN", we should also support 
> "BROADCASTJOIN" and "BROADCAST" in the comment, e.g. the following should be 
> accepted:
> {code}
> SELECT /*+ MAPJOIN(b) */ ...
> SELECT /*+ BROADCASTJOIN(b) */ ...
> SELECT /*+ BROADCAST(b) */ ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16818:


Assignee: Apache Spark

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Assignee: Apache Spark
>Priority: Critical
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400853#comment-15400853
 ] 

Apache Spark commented on SPARK-16818:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/14425

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Priority: Critical
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16818:


Assignee: (was: Apache Spark)

> Exchange reuse incorrectly reuses scans over different sets of partitions
> -
>
> Key: SPARK-16818
> URL: https://issues.apache.org/jira/browse/SPARK-16818
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Eric Liang
>Priority: Critical
>
> This happens because the file scan operator does not take into account 
> partition pruning in its implementation of `sameResult()`. As a result, 
> executions may be incorrect on self-joins over the same base file relation. 
> Here's a minimal test case to reproduce:
> {code}
> spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 
> 2.0
> withTempPath { path =>
>   val tempDir = path.getCanonicalPath
>   spark.range(10)
> .selectExpr("id % 2 as a", "id % 3 as b", "id as c")
> .write
> .partitionBy("a")
> .parquet(tempDir)
>   val df = spark.read.parquet(tempDir)
>   val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
>   val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
>   checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
> 10, 5) :: Nil)
> {code}
> When exchange reuse is on, the result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6| 6|
> |  1| 4| 4|
> |  2|10|10|
> +---+--+--+
> {code}
> The correct result is
> {code}
> +---+--+--+
> |  b|sum(c)|sum(c)|
> +---+--+--+
> |  0| 6|12|
> |  1| 4| 8|
> |  2|10| 5|
> +---+--+--+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400854#comment-15400854
 ] 

Xiao Li commented on SPARK-16275:
-

https://github.com/apache/hive/blob/15bdce43db4624a63be1f648e46d1f2baa1c67de/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L638-L748

This is the hash function of Hive. The implementation sounds ok, but I might 
need to check it with [~cloud_fan]. Not all the data types (e.g. Union) are 
supported. It is highly related to the data types. I am not exactly sure 
whether we have the same value ranges for each data type. To make sure they 
always generate the same result. The test cases might be a lot. 

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16818) Exchange reuse incorrectly reuses scans over different sets of partitions

2016-07-30 Thread Eric Liang (JIRA)
Eric Liang created SPARK-16818:
--

 Summary: Exchange reuse incorrectly reuses scans over different 
sets of partitions
 Key: SPARK-16818
 URL: https://issues.apache.org/jira/browse/SPARK-16818
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Eric Liang
Priority: Critical


This happens because the file scan operator does not take into account 
partition pruning in its implementation of `sameResult()`. As a result, 
executions may be incorrect on self-joins over the same base file relation. 
Here's a minimal test case to reproduce:

{code}
spark.conf.set("spark.sql.exchange.reuse", true)  // defaults to true in 2.0
withTempPath { path =>
  val tempDir = path.getCanonicalPath
  spark.range(10)
.selectExpr("id % 2 as a", "id % 3 as b", "id as c")
.write
.partitionBy("a")
.parquet(tempDir)
  val df = spark.read.parquet(tempDir)
  val df1 = df.where("a = 0").groupBy("b").agg("c" -> "sum")
  val df2 = df.where("a = 1").groupBy("b").agg("c" -> "sum")
  checkAnswer(df1.join(df2, "b"), Row(0, 6, 12) :: Row(1, 4, 8) :: Row(2, 
10, 5) :: Nil)
{code}

When exchange reuse is on, the result is
{code}
+---+--+--+
|  b|sum(c)|sum(c)|
+---+--+--+
|  0| 6| 6|
|  1| 4| 4|
|  2|10|10|
+---+--+--+
{code}

The correct result is
{code}
+---+--+--+
|  b|sum(c)|sum(c)|
+---+--+--+
|  0| 6|12|
|  1| 4| 8|
|  2|10| 5|
+---+--+--+
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400849#comment-15400849
 ] 

Xiao Li commented on SPARK-16275:
-

Let me check it. Thanks!

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400846#comment-15400846
 ] 

Reynold Xin commented on SPARK-16275:
-

How difficult would it be to provide a native hash implementation that
returns the same result?

If it is difficult, I'm fine with us updating all of those to the values
returned by our own native hash function.




> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400844#comment-15400844
 ] 

Xiao Li commented on SPARK-16275:
-

Yeah, many queries are using it. Below is the list:

auto_join_nulls
auto_join0
auto_join1
auto_join2
auto_join3
auto_join4
auto_join5
auto_join6
auto_join7
auto_join8
auto_join9
auto_join10
auto_join11
auto_join12
auto_join13
auto_join14
auto_join14_hadoop20
auto_join15
auto_join17
auto_join18
auto_join19
auto_join20
auto_join22
auto_join25
auto_join30
auto_join31
correlationoptimizer1
correlationoptimizer2
correlationoptimizer3
correlationoptimizer4
multiMapJoin1
orc_dictionary_threshold
udf_hash

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400783#comment-15400783
 ] 

Reynold Xin commented on SPARK-16275:
-

What do we use Hive's hash function for? Are there queries in the Hive 
compatibility suite that is using it?


> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16817) Enable storing of shuffle data in Alluxio

2016-07-30 Thread Tim Bisson (JIRA)
Tim Bisson created SPARK-16817:
--

 Summary: Enable storing of shuffle data in Alluxio
 Key: SPARK-16817
 URL: https://issues.apache.org/jira/browse/SPARK-16817
 Project: Spark
  Issue Type: New Feature
Reporter: Tim Bisson


If one is using Alluxio for storage, it would also be useful if Spark can store 
shuffle spill data in Alluxio. For example:
spark.local.dir="alluxio://host:port/path"

Several users on the Alluxio mailing list have asked for this feature:
https://groups.google.com/forum/?fromgroups#!searchin/alluxio-users/shuffle$20spark|sort:relevance/alluxio-users/90pRZWRVi0s/mgLWLS5aAgAJ
https://groups.google.com/forum/?fromgroups#!searchin/alluxio-users/shuffle$20spark|sort:relevance/alluxio-users/s9H93PnDebw/v_1_FMjR7vEJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-07-30 Thread snehil suresh wakchaure (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400751#comment-15400751
 ] 

snehil suresh wakchaure edited comment on SPARK-5992 at 7/30/16 5:43 PM:
-

Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started. Is this going to be a 
scala, java or python codebase?

Any updates from the Uber community? 


was (Author: snehil.w):
Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started.

Any updates from the Uber community? 

> Locality Sensitive Hashing (LSH) for MLlib
> --
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16518) Schema Compatibility of Parquet Data Source

2016-07-30 Thread Chanh Le (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400755#comment-15400755
 ] 

Chanh Le edited comment on SPARK-16518 at 7/30/16 5:38 PM:
---

Did we have a patch for that? Why before release we didn't this case?
If I change int to bigint it's fine but If I use int it thows the error.

{code}
CREATE EXTERNAL TABLE os (os_id bigint, os_name String)
STORED AS PARQUET LOCATION 'alluxio://master2:19998/etl_info/OS';
{code}
0: jdbc:hive2://master1:1> select * from os limit 1;
++--+--+
| os_id  | os_name  |
++--+--+
| 15 | Solaris  |
++--+--+
1 row selected (0.514 seconds)
{code}
CREATE EXTERNAL TABLE os (os_id int, os_name String)
STORED AS PARQUET LOCATION 'alluxio://master2:19998/etl_info/OS';
{code}
-> throws the same error.


was (Author: giaosuddau):
Did we have a patch for that?
Right now I have this error too.


> Schema Compatibility of Parquet Data Source
> ---
>
> Key: SPARK-16518
> URL: https://issues.apache.org/jira/browse/SPARK-16518
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are not checking the schema compatibility. Different file 
> formats behave differently. This JIRA just summarizes what I observed for 
> parquet data source tables.
> *Scenario 1 Data type mismatch*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int, col2 int)}}
> *Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the error we 
> got:
> {noformat}
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most 
> recent failure:
>  Lost task 0.0 in stage 4.0 (TID 4, localhost): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:231)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(generated.java:62)
> {noformat}
> *Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the error we 
> got:
> {noformat}
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost): 
> org.apache.spark.SparkException:
>  Failed merging schema of file 
> file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-4c2f0b69-ee05-4be1-91f0-0e54f89f2308/part-r-0-6b76638c-a624-444c-9479-3c8e894cb65e.snappy.parquet:
> root
>  |-- a: integer (nullable = false)
>  |-- b: string (nullable = true)
> {noformat}
> *Scenario 2 More columns in append dataset*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int, col2 string, col3 int)}}
> *Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the schema 
> of the resultset is {{(col1 int, col2 string)}}.
> *Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the schema of 
> the resultset is {{(col1 int, col2 string, col3 int)}}.
> *Scenario 3 Less columns in append dataset*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int)}}
>*Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the 
> schema of the resultset is {{(col1 int, col2 string)}}.
>*Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the schema 
> of the resultset is {{(col1 int)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16275) Implement all the Hive fallback functions

2016-07-30 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400756#comment-15400756
 ] 

Xiao Li commented on SPARK-16275:
-

[~rxin] What is the plan for {{hash}}? If we use our version, it breaks a lot 
of test cases in {{HiveCompatibilitySuite}}. For resolving the failure of test 
cases, we can migrate them into a separate test suite based on our hash 
function. This is just a labor job. Do you think this is OK?

Thanks!

> Implement all the Hive fallback functions
> -
>
> Key: SPARK-16275
> URL: https://issues.apache.org/jira/browse/SPARK-16275
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> As of Spark 2.0, Spark falls back to Hive for only the following built-in 
> functions:
> {code}
> "elt", "hash", "java_method", "histogram_numeric",
> "map_keys", "map_values",
> "parse_url", "percentile", "percentile_approx", "reflect", "sentences", 
> "stack", "str_to_map",
> "xpath", "xpath_boolean", "xpath_double", "xpath_float", "xpath_int", 
> "xpath_long",
> "xpath_number", "xpath_short", "xpath_string",
> // table generating function
> "inline", "posexplode"
> {code}
> The goal of the ticket is to implement all of these in Spark so we don't need 
> to fall back into Hive's UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-07-30 Thread snehil suresh wakchaure (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400751#comment-15400751
 ] 

snehil suresh wakchaure edited comment on SPARK-5992 at 7/30/16 5:28 PM:
-

Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started & where we are at right 
now with this feature design.

Any updates from the Uber community? 


was (Author: snehil.w):
Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started & where we are at right 
now with this feature design.

> Locality Sensitive Hashing (LSH) for MLlib
> --
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16518) Schema Compatibility of Parquet Data Source

2016-07-30 Thread Chanh Le (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400755#comment-15400755
 ] 

Chanh Le commented on SPARK-16518:
--

Did we have a patch for that?
Right now I have this error too.


> Schema Compatibility of Parquet Data Source
> ---
>
> Key: SPARK-16518
> URL: https://issues.apache.org/jira/browse/SPARK-16518
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are not checking the schema compatibility. Different file 
> formats behave differently. This JIRA just summarizes what I observed for 
> parquet data source tables.
> *Scenario 1 Data type mismatch*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int, col2 int)}}
> *Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the error we 
> got:
> {noformat}
> Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most 
> recent failure:
>  Lost task 0.0 in stage 4.0 (TID 4, localhost): java.lang.NullPointerException
>   at 
> org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.getInt(OnHeapColumnVector.java:231)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(generated.java:62)
> {noformat}
> *Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the error we 
> got:
> {noformat}
> Job aborted due to stage failure: Task 0 in stage 3.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 3.0 (TID 3, localhost): 
> org.apache.spark.SparkException:
>  Failed merging schema of file 
> file:/private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-4c2f0b69-ee05-4be1-91f0-0e54f89f2308/part-r-0-6b76638c-a624-444c-9479-3c8e894cb65e.snappy.parquet:
> root
>  |-- a: integer (nullable = false)
>  |-- b: string (nullable = true)
> {noformat}
> *Scenario 2 More columns in append dataset*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int, col2 string, col3 int)}}
> *Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the schema 
> of the resultset is {{(col1 int, col2 string)}}.
> *Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the schema of 
> the resultset is {{(col1 int, col2 string, col3 int)}}.
> *Scenario 3 Less columns in append dataset*:
> The existing schema is {{(col1 int, col2 string)}}
> The schema of appending dataset is {{(col1 int)}}
>*Case 1*: _when {{spark.sql.parquet.mergeSchema}} is {{false}}_, the 
> schema of the resultset is {{(col1 int, col2 string)}}.
>*Case 2*: _when {{spark.sql.parquet.mergeSchema}} is {{true}}_, the schema 
> of the resultset is {{(col1 int)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-07-30 Thread snehil suresh wakchaure (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400751#comment-15400751
 ] 

snehil suresh wakchaure edited comment on SPARK-5992 at 7/30/16 5:31 PM:
-

Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started.

Any updates from the Uber community? 


was (Author: snehil.w):
Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started & where we are at right 
now with this feature design.

Any updates from the Uber community? 

> Locality Sensitive Hashing (LSH) for MLlib
> --
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-07-30 Thread snehil suresh wakchaure (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400751#comment-15400751
 ] 

snehil suresh wakchaure commented on SPARK-5992:


Hello, just curious to know if I can contribute to this project too although I 
am new at it. I Can use some pointers to get started & where we are at right 
now with this feature design.

> Locality Sensitive Hashing (LSH) for MLlib
> --
>
> Key: SPARK-5992
> URL: https://issues.apache.org/jira/browse/SPARK-5992
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Joseph K. Bradley
>
> Locality Sensitive Hashing (LSH) would be very useful for ML.  It would be 
> great to discuss some possible algorithms here, choose an API, and make a PR 
> for an initial algorithm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16800) Fix Java Examples that throw exception

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16800.
---
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14405
[https://github.com/apache/spark/pull/14405]

> Fix Java Examples that throw exception
> --
>
> Key: SPARK-16800
> URL: https://issues.apache.org/jira/browse/SPARK-16800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples, ML
>Affects Versions: 2.0.0
>Reporter: Bryan Cutler
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Some Java examples fail to run due to an exception thrown when using 
> mllib.linalg.Vectors instead of ml.linalg.Vectors.  Also, some have incorrect 
> data types defined in the schema that cause an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16800) Fix Java Examples that throw exception

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16800:
--
Assignee: Bryan Cutler

> Fix Java Examples that throw exception
> --
>
> Key: SPARK-16800
> URL: https://issues.apache.org/jira/browse/SPARK-16800
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples, ML
>Affects Versions: 2.0.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Some Java examples fail to run due to an exception thrown when using 
> mllib.linalg.Vectors instead of ml.linalg.Vectors.  Also, some have incorrect 
> data types defined in the schema that cause an exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16696) unused broadcast variables should call destroy instead of unpersist

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16696:
--
 Assignee: Weichen Xu
Affects Version/s: (was: 2.1.0)
 Priority: Minor  (was: Major)

> unused broadcast variables should call destroy instead of unpersist
> ---
>
> Key: SPARK-16696
> URL: https://issues.apache.org/jira/browse/SPARK-16696
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Minor
> Fix For: 2.1.0
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Unused broadcast variables should call destroy() instead of unpersist() so 
> that the memory can released in time, even in driver-side.
> currently, several algorithm in ML, such as KMeans, Word2Vec, has this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16696) unused broadcast variables should call destroy instead of unpersist

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16696.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14333
[https://github.com/apache/spark/pull/14333]

> unused broadcast variables should call destroy instead of unpersist
> ---
>
> Key: SPARK-16696
> URL: https://issues.apache.org/jira/browse/SPARK-16696
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.0.1
>Reporter: Weichen Xu
> Fix For: 2.1.0
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Unused broadcast variables should call destroy() instead of unpersist() so 
> that the memory can released in time, even in driver-side.
> currently, several algorithm in ML, such as KMeans, Word2Vec, has this 
> problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16816) Add api to get JavaSparkContext from SparkSession

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400681#comment-15400681
 ] 

Apache Spark commented on SPARK-16816:
--

User 'phalodi' has created a pull request for this issue:
https://github.com/apache/spark/pull/14421

> Add api to get JavaSparkContext from SparkSession
> -
>
> Key: SPARK-16816
> URL: https://issues.apache.org/jira/browse/SPARK-16816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Priority: Minor
>  Labels: patch
> Fix For: 2.0.0
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> In this improvement the user can directly get the JavaSparkContext from the 
> SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16816) Add api to get JavaSparkContext from SparkSession

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16816:


Assignee: Apache Spark

> Add api to get JavaSparkContext from SparkSession
> -
>
> Key: SPARK-16816
> URL: https://issues.apache.org/jira/browse/SPARK-16816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Assignee: Apache Spark
>Priority: Minor
>  Labels: patch
> Fix For: 2.0.0
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> In this improvement the user can directly get the JavaSparkContext from the 
> SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16816) Add api to get JavaSparkContext from SparkSession

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16816:


Assignee: (was: Apache Spark)

> Add api to get JavaSparkContext from SparkSession
> -
>
> Key: SPARK-16816
> URL: https://issues.apache.org/jira/browse/SPARK-16816
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: sandeep purohit
>Priority: Minor
>  Labels: patch
> Fix For: 2.0.0
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> In this improvement the user can directly get the JavaSparkContext from the 
> SparkSession.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16816) Add api to get JavaSparkContext from SparkSession

2016-07-30 Thread sandeep purohit (JIRA)
sandeep purohit created SPARK-16816:
---

 Summary: Add api to get JavaSparkContext from SparkSession
 Key: SPARK-16816
 URL: https://issues.apache.org/jira/browse/SPARK-16816
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: sandeep purohit
Priority: Minor
 Fix For: 2.0.0


In this improvement the user can directly get the JavaSparkContext from the 
SparkSession.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14204) [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400668#comment-15400668
 ] 

Apache Spark commented on SPARK-14204:
--

User 'mchalek' has created a pull request for this issue:
https://github.com/apache/spark/pull/14420

> [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode
> --
>
> Key: SPARK-14204
> URL: https://issues.apache.org/jira/browse/SPARK-14204
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Kevin McHale
>Assignee: Kevin McHale
>  Labels: JDBC, SQL
> Fix For: 1.6.2
>
>
> DataFrameReader JDBC methods throw an IllegalStateException when:
>   1. the JDBC driver is contained in a user-provided jar, and
>   2. the user does not specify which driver to use, but rather allows spark 
> to determine the driver from the JDBC URL.
> This broke some of our database ETL jobs at @premisedata when we upgraded 
> from 1.6.0 to 1.6.1.
> I have tracked the problem down to a regression introduced in the fix for 
> SPARK-12579: 
> https://github.com/apache/spark/commit/7f37c1e45d52b7823d566349e2be21366d73651f#diff-391379a5ec51082e2ae1209db15c02b3R53
> The issue is that DriverRegistry.register is not called on the executors for 
> a JDBC driver that is derived from the JDBC path.
> The problem can be demonstrated within spark-shell, provided you're in 
> cluster mode and you've deployed a JDBC driver (e.g. postgresql.Driver) via 
> the --jars argument:
> {code}
> import 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.createConnectionFactory
> val factory = 
> createConnectionFactory("jdbc:postgresql://whatever.you.want/database?user=user&password=password",
>  new java.util.Properties)
> sc.parallelize(1 to 100).foreach { _ => factory() } // throws exception
> {code}
> A sufficient fix is to apply DriverRegistry.register to the `driverClass` 
> variable, rather than to `userSpecifiedDriverClass`, at the code link 
> provided above.  I will submit a PR for this shortly.
> In the meantime, a temporary workaround is to manually specify the JDBC 
> driver class in the Properties object passed to DataFrameReader.jdbc, or in 
> the options used in other entry points, which will force the executors to 
> register the class properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-07-30 Thread Jay Teguh Wijaya Purwanto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400658#comment-15400658
 ] 

Jay Teguh Wijaya Purwanto edited comment on SPARK-16700 at 7/30/16 12:34 PM:
-

When using `Row` object, but with multiple struct types, also returns similar 
error:

{code}
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
{code}

Returned error:

{code}
DoubleType can not accept object '1' in type 
{code}


was (Author: jaycode):
When using `Row` object, but with multiple struct types, also returns similar 
error:

```
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
```

Returned error:

```
DoubleType can not accept object '1' in type 
```

> StructType doesn't accept Python dicts anymore
> --
>
> Key: SPARK-16700
> URL: https://issues.apache.org/jira/browse/SPARK-16700
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>
> Hello,
> I found this issue while testing my codebase with 2.0.0-rc5
> StructType in Spark 1.6.2 accepts the Python  type, which is very 
> handy. 2.0.0-rc5 does not and throws an error.
> I don't know if this was intended but I'd advocate for this behaviour to 
> remain the same. MapType is probably wasteful when your key names never 
> change and switching to Python tuples would be cumbersome.
> Here is a minimal script to reproduce the issue: 
> {code}
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> struct_schema = SparkTypes.StructType([
> SparkTypes.StructField("id", SparkTypes.LongType())
> ])
> rdd = sc.parallelize([{"id": 0}, {"id": 1}])
> df = sqlc.createDataFrame(rdd, struct_schema)
> print df.collect()
> # 1.6.2 prints [Row(id=0), Row(id=1)]
> # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
> type 
> {code}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-07-30 Thread Jay Teguh Wijaya Purwanto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400658#comment-15400658
 ] 

Jay Teguh Wijaya Purwanto commented on SPARK-16700:
---

When using `Row` object, but with multiple struct types, also returns similar 
error:

```
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
```

Returned error:

```
DoubleType can not accept object '1' in type 
```

> StructType doesn't accept Python dicts anymore
> --
>
> Key: SPARK-16700
> URL: https://issues.apache.org/jira/browse/SPARK-16700
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>
> Hello,
> I found this issue while testing my codebase with 2.0.0-rc5
> StructType in Spark 1.6.2 accepts the Python  type, which is very 
> handy. 2.0.0-rc5 does not and throws an error.
> I don't know if this was intended but I'd advocate for this behaviour to 
> remain the same. MapType is probably wasteful when your key names never 
> change and switching to Python tuples would be cumbersome.
> Here is a minimal script to reproduce the issue: 
> {code}
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> struct_schema = SparkTypes.StructType([
> SparkTypes.StructField("id", SparkTypes.LongType())
> ])
> rdd = sc.parallelize([{"id": 0}, {"id": 1}])
> df = sqlc.createDataFrame(rdd, struct_schema)
> print df.collect()
> # 1.6.2 prints [Row(id=0), Row(id=1)]
> # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
> type 
> {code}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16694) Use for/foreach rather than map for Unit expressions whose side effects are required

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16694.
---
   Resolution: Fixed
Fix Version/s: 2.1.0

Issue resolved by pull request 14332
[https://github.com/apache/spark/pull/14332]

> Use for/foreach rather than map for Unit expressions whose side effects are 
> required
> 
>
> Key: SPARK-16694
> URL: https://issues.apache.org/jira/browse/SPARK-16694
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples, MLlib, Spark Core, SQL, Streaming
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.1.0
>
>
> {{map}} is misused in many places where {{foreach}} is intended. This caused 
> a bug in https://issues.apache.org/jira/browse/SPARK-16664 and might be a 
> latent bug elsewhere; it's also easy to find with IJ inspections. Worth 
> patching up. 
> To illustrate the general problem, {{map}} happens to work in Scala where the 
> collection isn't lazy, but will fail to execute the code when it is. {{map}} 
> also causes a collection of {{Unit}} to be created pointlessly.
> {code}
> scala> val foo = Seq(1,2,3)
> foo: Seq[Int] = List(1, 2, 3)
> scala> foo.map(println)
> 1
> 2
> 3
> res0: Seq[Unit] = List((), (), ())
> scala> foo.view.map(println)
> res1: scala.collection.SeqView[Unit,Seq[_]] = SeqViewM(...)
> scala> foo.view.foreach(println)
> 1
> 2
> 3
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen closed SPARK-16797.
-
   Resolution: Duplicate
Fix Version/s: (was: 2.0.0)

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-30 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-16797:
---

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16815) Dataset[List[T]] leads to ArrayStoreException

2016-07-30 Thread TobiasP (JIRA)
TobiasP created SPARK-16815:
---

 Summary: Dataset[List[T]] leads to ArrayStoreException
 Key: SPARK-16815
 URL: https://issues.apache.org/jira/browse/SPARK-16815
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: TobiasP
Priority: Minor


{noformat}
scala> spark.sqlContext.createDataset(sc.parallelize(List(1) :: Nil)).collect
java.lang.ArrayStoreException: scala.collection.mutable.WrappedArray$ofRef  
  at scala.collection.mutable.ArrayBuilder$ofRef.$plus$eq(ArrayBuilder.scala:87)
  at scala.collection.mutable.ArrayBuilder$ofRef.$plus$eq(ArrayBuilder.scala:56)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
  at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2218)
  at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
  at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2568)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2217)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:)
  at 
org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:)
  at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2581)
  at 
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2198)
  ... 48 elided

{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-30 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400542#comment-15400542
 ] 

Dongjoon Hyun commented on SPARK-16797:
---

Oh, I see. Never mind, [~bjeffrey].

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
> Fix For: 2.0.0
>
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16807) Optimize some ABS() statements

2016-07-30 Thread Kazuaki Ishizaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400532#comment-15400532
 ] 

Kazuaki Ishizaki commented on SPARK-16807:
--

Interesting if we can ensure {{x - y}} is not {{infinite}} or {{NaN}}. Since I 
am not familiar with SQL, I do not know that how we can ensure this condition 
in Spark SQL.

In general, this generated code seems to take care {{filter_value6}} is 
{{infinite}} or {{NaN}}. It would be good to read 
[abs(float)|https://docs.oracle.com/javase/7/docs/api/java/lang/Math.html#abs(float)]
 and 
[nanSafeCompareFloats|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/Utils.scala#L1615].

> Optimize some ABS() statements
> --
>
> Key: SPARK-16807
> URL: https://issues.apache.org/jira/browse/SPARK-16807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sylvain Zimmer
>Priority: Minor
>
> I'm not a Catalyst expert, but I think some use cases for the ABS() function 
> could generate simpler code.
> This is the code generated when doing something like {{ABS(x - y) > 0}} or 
> {{ABS(x - y) = 0}} in Spark SQL:
> {code}
> /* 267 */   float filter_value6 = -1.0f;
> /* 268 */   filter_value6 = agg_value27 - agg_value32;
> /* 269 */   float filter_value5 = -1.0f;
> /* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
> /* 271 */
> /* 272 */   boolean filter_value4 = false;
> /* 273 */   filter_value4 = 
> org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
> /* 274 */   if (!filter_value4) continue;
> {code}
> Maybe it could all be simplified to something like this?
> {code}
> filter_value4 = (agg_value27 != agg_value32)
> {code}
> (Of course you could write {{x != y}} directly in the SQL query, but the 
> {{0}} in my example could be a configurable threshold, not something you can 
> hardcode)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16814) Fix deprecated use of ParquetWriter in Parquet test suites

2016-07-30 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400512#comment-15400512
 ] 

Apache Spark commented on SPARK-16814:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/14419

> Fix deprecated use of ParquetWriter in Parquet test suites
> --
>
> Key: SPARK-16814
> URL: https://issues.apache.org/jira/browse/SPARK-16814
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: holdenk
>
> Replace deprecated ParquetWriter with the new builders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16814) Fix deprecated use of ParquetWriter in Parquet test suites

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16814:


Assignee: Apache Spark

> Fix deprecated use of ParquetWriter in Parquet test suites
> --
>
> Key: SPARK-16814
> URL: https://issues.apache.org/jira/browse/SPARK-16814
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: holdenk
>Assignee: Apache Spark
>
> Replace deprecated ParquetWriter with the new builders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-16814) Fix deprecated use of ParquetWriter in Parquet test suites

2016-07-30 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16814:


Assignee: (was: Apache Spark)

> Fix deprecated use of ParquetWriter in Parquet test suites
> --
>
> Key: SPARK-16814
> URL: https://issues.apache.org/jira/browse/SPARK-16814
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: holdenk
>
> Replace deprecated ParquetWriter with the new builders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16814) Fix deprecated use of ParquetWriter in Parquet test suites

2016-07-30 Thread holdenk (JIRA)
holdenk created SPARK-16814:
---

 Summary: Fix deprecated use of ParquetWriter in Parquet test suites
 Key: SPARK-16814
 URL: https://issues.apache.org/jira/browse/SPARK-16814
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: holdenk


Replace deprecated ParquetWriter with the new builders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org