date:20161217

[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

2016-12-17 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756708#comment-15756708
 ] 

Sean Owen commented on SPARK-18879:
---

I don't know that this function of Hive itself is something that Spark promises 
to support Spark has its own API and that is what is generally supported and 
guaranteed.

> Spark SQL support for Hive hooks regressed
> --
>
> Key: SPARK-18879
> URL: https://issues.apache.org/jira/browse/SPARK-18879
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.2
>Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
> at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:147)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e

[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

2016-12-17 Thread Atul Payapilly (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756759#comment-15756759
 ] 

Atul Payapilly commented on SPARK-18879:


That's a fair point, the feature was supported incidentally as a side effect of 
using the HiveDriver initially, so its technically not a regression of a 
supported feature.

To add some context, the reason I'm looking into this is that some of the 
functionality of the Hive hooks are quite useful. Maybe I should close this and 
open a feature request? The functionality I think would be most important is 
being able to tell at the end of a query which partitions were updated, 
specifically in a dynamic partition insert, since this can't be derived by 
analyzing the query alone.

What are your thoughts on this?

> Spark SQL support for Hive hooks regressed
> --
>
> Key: SPARK-18879
> URL: https://issues.apache.org/jira/browse/SPARK-18879
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.2
>Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
> at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:147)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execut

[jira] [Commented] (SPARK-18891) Support for specific collection types

2016-12-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756842#comment-15756842
 ] 

Michal Šenkýř commented on SPARK-18891:
---

[PR #16240|https://github.com/apache/spark/pull/16240] should be relevant. It 
adds deserialization support for arbitrary sequence types (serialization was 
already supported). The only thing not working yet is the {{Seq.toDS}} method 
due to implicits resolution ({{SparkSession.createDataset}} works fine).

> Support for specific collection types
> -
>
> Key: SPARK-18891
> URL: https://issues.apache.org/jira/browse/SPARK-18891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.3, 2.1.0
>Reporter: Michael Armbrust
>Priority: Critical
>
> Encoders treat all collections the same (i.e. {{Seq}} vs {{List}}) which 
> force users to only define classes with the most generic type.
> An [example 
> error|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2398463439880241/2840265927289860/latest.html]:
> {code}
> case class SpecificCollection(aList: List[Int])
> Seq(SpecificCollection(1 :: Nil)).toDS().collect()
> {code}
> {code}
> java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 98, Column 120: No applicable constructor/method found 
> for actual parameters "scala.collection.Seq"; candidates are: 
> "line29e7e4b1e36445baa3505b2e102aa86b29.$read$$iw$$iw$$iw$$iw$SpecificCollection(scala.collection.immutable.List)"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18891) Support for specific collection types

2016-12-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756842#comment-15756842
 ] 

Michal Šenkýř edited comment on SPARK-18891 at 12/17/16 11:18 AM:
--

[PR #16240|https://github.com/apache/spark/pull/16240] should be relevant. It 
adds encoders and deserialization support for arbitrary sequence types 
(serialization was already supported). The only thing not working yet is the 
{{Seq.toDS}} method due to implicits resolution ({{SparkSession.createDataset}} 
works fine).


was (Author: michalsenkyr):
[PR #16240|https://github.com/apache/spark/pull/16240] should be relevant. It 
adds deserialization support for arbitrary sequence types (serialization was 
already supported). The only thing not working yet is the {{Seq.toDS}} method 
due to implicits resolution ({{SparkSession.createDataset}} works fine).

> Support for specific collection types
> -
>
> Key: SPARK-18891
> URL: https://issues.apache.org/jira/browse/SPARK-18891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.3, 2.1.0
>Reporter: Michael Armbrust
>Priority: Critical
>
> Encoders treat all collections the same (i.e. {{Seq}} vs {{List}}) which 
> force users to only define classes with the most generic type.
> An [example 
> error|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2398463439880241/2840265927289860/latest.html]:
> {code}
> case class SpecificCollection(aList: List[Int])
> Seq(SpecificCollection(1 :: Nil)).toDS().collect()
> {code}
> {code}
> java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 98, Column 120: No applicable constructor/method found 
> for actual parameters "scala.collection.Seq"; candidates are: 
> "line29e7e4b1e36445baa3505b2e102aa86b29.$read$$iw$$iw$$iw$$iw$SpecificCollection(scala.collection.immutable.List)"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18891) Support for specific collection types

2016-12-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756842#comment-15756842
 ] 

Michal Šenkýř edited comment on SPARK-18891 at 12/17/16 11:18 AM:
--

[PR #16240|https://github.com/apache/spark/pull/16240] should be relevant. It 
adds implicit encoders and deserialization support for arbitrary sequence types 
(serialization was already supported). The only thing not working yet is the 
{{Seq.toDS}} method due to implicits resolution ({{SparkSession.createDataset}} 
works fine).


was (Author: michalsenkyr):
[PR #16240|https://github.com/apache/spark/pull/16240] should be relevant. It 
adds encoders and deserialization support for arbitrary sequence types 
(serialization was already supported). The only thing not working yet is the 
{{Seq.toDS}} method due to implicits resolution ({{SparkSession.createDataset}} 
works fine).

> Support for specific collection types
> -
>
> Key: SPARK-18891
> URL: https://issues.apache.org/jira/browse/SPARK-18891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.3, 2.1.0
>Reporter: Michael Armbrust
>Priority: Critical
>
> Encoders treat all collections the same (i.e. {{Seq}} vs {{List}}) which 
> force users to only define classes with the most generic type.
> An [example 
> error|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2398463439880241/2840265927289860/latest.html]:
> {code}
> case class SpecificCollection(aList: List[Int])
> Seq(SpecificCollection(1 :: Nil)).toDS().collect()
> {code}
> {code}
> java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 98, Column 120: No applicable constructor/method found 
> for actual parameters "scala.collection.Seq"; candidates are: 
> "line29e7e4b1e36445baa3505b2e102aa86b29.$read$$iw$$iw$$iw$$iw$SpecificCollection(scala.collection.immutable.List)"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18912) append to a non-file-based data source table should detect columns number mismatch

2016-12-17 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-18912:
---

 Summary: append to a non-file-based data source table should 
detect columns number mismatch
 Key: SPARK-18912
 URL: https://issues.apache.org/jira/browse/SPARK-18912
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18913) append to a table with special column names should work

2016-12-17 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-18913:
---

 Summary: append to a table with special column names should work
 Key: SPARK-18913
 URL: https://issues.apache.org/jira/browse/SPARK-18913
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18913) append to a table with special column names should work

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18913:


Assignee: Apache Spark  (was: Wenchen Fan)

> append to a table with special column names should work
> ---
>
> Key: SPARK-18913
> URL: https://issues.apache.org/jira/browse/SPARK-18913
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18912) append to a non-file-based data source table should detect columns number mismatch

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18912:


Assignee: Wenchen Fan  (was: Apache Spark)

> append to a non-file-based data source table should detect columns number 
> mismatch
> --
>
> Key: SPARK-18912
> URL: https://issues.apache.org/jira/browse/SPARK-18912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18913) append to a table with special column names should work

2016-12-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756891#comment-15756891
 ] 

Apache Spark commented on SPARK-18913:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16313

> append to a table with special column names should work
> ---
>
> Key: SPARK-18913
> URL: https://issues.apache.org/jira/browse/SPARK-18913
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18912) append to a non-file-based data source table should detect columns number mismatch

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18912:


Assignee: Apache Spark  (was: Wenchen Fan)

> append to a non-file-based data source table should detect columns number 
> mismatch
> --
>
> Key: SPARK-18912
> URL: https://issues.apache.org/jira/browse/SPARK-18912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18912) append to a non-file-based data source table should detect columns number mismatch

2016-12-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756890#comment-15756890
 ] 

Apache Spark commented on SPARK-18912:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16313

> append to a non-file-based data source table should detect columns number 
> mismatch
> --
>
> Key: SPARK-18912
> URL: https://issues.apache.org/jira/browse/SPARK-18912
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18913) append to a table with special column names should work

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18913:


Assignee: Wenchen Fan  (was: Apache Spark)

> append to a table with special column names should work
> ---
>
> Key: SPARK-18913
> URL: https://issues.apache.org/jira/browse/SPARK-18913
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18879) Spark SQL support for Hive hooks regressed

2016-12-17 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756917#comment-15756917
 ] 

Sean Owen commented on SPARK-18879:
---

I don't think it would be a goal to support Hive functionality. I don't know 
enough to say what if anything the equivalent is in Spark or whether there 
would be one.

> Spark SQL support for Hive hooks regressed
> --
>
> Key: SPARK-18879
> URL: https://issues.apache.org/jira/browse/SPARK-18879
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.2
>Reporter: Atul Payapilly
>
> As per the stack trace from this post: 
> http://ihorbobak.com/index.php/2015/05/08/113/
> run on Spark 1.3.1
> hive.exec.pre.hooks Class not found:org.apache.hadoop.hive.ql.hooks.ATSHook
> FAILED: Hive Internal Error: 
> java.lang.ClassNotFoundException(org.apache.hadoop.hive.ql.hooks.ATSHook)
> java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.hooks.ATSHook
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1172)
> at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1156)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:318)
> at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:290)
> at 
> org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54)
> at 
> org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099)
> at 
> org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:147)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
> at org.apache.spark.sql.hive.HiveContext.sql(HiveContext.scala:101)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.run(Shim13.scala:164)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218)
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:344)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> It looks like Spark used to rely on the Hive Driver for execution and 
> supported hive hooks. The current code path does not rely on the Hive Driver 
> and support for Hive hooks regressed. This is problematic, for example, there 
> is no way to tell which partitions were updated as part of a query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-ma

[jira] [Commented] (SPARK-18891) Support for specific collection types

2016-12-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-18891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756964#comment-15756964
 ] 

Michal Šenkýř commented on SPARK-18891:
---

I should also add, that {{Seq.toDS}} doesn't work only for non-Seq collections 
containing case classes, so the example in the description actually works with 
the proposed changes.

> Support for specific collection types
> -
>
> Key: SPARK-18891
> URL: https://issues.apache.org/jira/browse/SPARK-18891
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.3, 2.1.0
>Reporter: Michael Armbrust
>Priority: Critical
>
> Encoders treat all collections the same (i.e. {{Seq}} vs {{List}}) which 
> force users to only define classes with the most generic type.
> An [example 
> error|https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2398463439880241/2840265927289860/latest.html]:
> {code}
> case class SpecificCollection(aList: List[Int])
> Seq(SpecificCollection(1 :: Nil)).toDS().collect()
> {code}
> {code}
> java.lang.RuntimeException: Error while decoding: 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 98, Column 120: No applicable constructor/method found 
> for actual parameters "scala.collection.Seq"; candidates are: 
> "line29e7e4b1e36445baa3505b2e102aa86b29.$read$$iw$$iw$$iw$$iw$SpecificCollection(scala.collection.immutable.List)"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757006#comment-15757006
 ] 

Sean Owen commented on SPARK-18648:
---

I see, I get it now. It is complaining about c, not e. The call to textFile is 
actually not relevant.
I don't know the answer, but would have thought that {{file:///c:/test/my.jar}} 
would work, or {{/c:/test/my.jar}}
[~hyukjin.kwon] given your familiarity with Windows, do you know the right way 
to pass a path?

> spark-shell --jars option does not add jars to classpath on windows
> ---
>
> Key: SPARK-18648
> URL: https://issues.apache.org/jira/browse/SPARK-18648
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 2.0.2
> Environment: Windows 7 x64
>Reporter: Michel Lemay
>  Labels: windows
>
> I can't import symbols from command line jars when in the shell:
> Adding jars via --jars:
> {code}
> spark-shell --master local[*] --jars path\to\deeplearning4j-core-0.7.0.jar
> {code}
> Same result if I add it through maven coordinates:
> {code}spark-shell --master local[*] --packages 
> org.deeplearning4j:deeplearning4j-core:0.7.0
> {code}
> I end up with:
> {code}
> scala> import org.deeplearning4j
> :23: error: object deeplearning4j is not a member of package org
>import org.deeplearning4j
> {code}
> NOTE: It is working as expected when running on linux.
> Sample output with --verbose:
> {code}
> Using properties file: null
> Parsed arguments:
>   master  local[*]
>   deployMode  null
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   org.apache.spark.repl.Main
>   primaryResource spark-shell
>   nameSpark shell
>   childArgs   []
>   jars
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
> Main class:
> org.apache.spark.repl.Main
> Arguments:
> System properties:
> SPARK_SUBMIT -> true
> spark.app.name -> Spark shell
> spark.jars -> 
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
> spark.submit.deployMode -> client
> spark.master -> local[*]
> Classpath elements:
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
> 16/11/30 08:30:49 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/11/30 08:30:51 WARN SparkContext: Use an existing SparkContext, some 
> configuration may not take effect.
> Spark context Web UI available at http://192.168.70.164:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1480512651325).
> Spark session available as 'spark'.
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.0.2
>   /_/
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.deeplearning4j
> :23: error: object deeplearning4j is not a member of package org
>import org.deeplearning4j
>   ^
> scala>
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18485) Underlying integer overflow when create ChunkedByteBufferOutputStream in MemoryStore

2016-12-17 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18485.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 15915
[https://github.com/apache/spark/pull/15915]

> Underlying integer overflow when create ChunkedByteBufferOutputStream in 
> MemoryStore
> 
>
> Key: SPARK-18485
> URL: https://issues.apache.org/jira/browse/SPARK-18485
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Genmao Yu
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18485) Underlying integer overflow when create ChunkedByteBufferOutputStream in MemoryStore

2016-12-17 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18485:
--
Assignee: Genmao Yu
Priority: Minor  (was: Major)

> Underlying integer overflow when create ChunkedByteBufferOutputStream in 
> MemoryStore
> 
>
> Key: SPARK-18485
> URL: https://issues.apache.org/jira/browse/SPARK-18485
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Genmao Yu
>Assignee: Genmao Yu
>Priority: Minor
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18914) Local UDTs test fails due to "ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector"

2016-12-17 Thread Jacek Laskowski (JIRA)

Jacek Laskowski created SPARK-18914:
---

 Summary: Local UDTs test fails due to "ClassCastException: 
java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector"
 Key: SPARK-18914
 URL: https://issues.apache.org/jira/browse/SPARK-18914
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 2.2.0
Reporter: Jacek Laskowski
Priority: Minor


{code}
[info] - Local UDTs *** FAILED *** (110 milliseconds)
[info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.UDT$MyDenseVector
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:74)
[info]   at 
org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:74)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
[info]   at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18914) Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to "ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector"

2016-12-17 Thread Jacek Laskowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-18914:

Summary: Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails 
due to "ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.UDT$MyDenseVector"  (was: Local UDTs test fails due to 
"ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.UDT$MyDenseVector")

> Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to 
> "ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector"
> -
>
> Key: SPARK-18914
> URL: https://issues.apache.org/jira/browse/SPARK-18914
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> [info] ScalaTest
> [info] Run completed in 1 hour, 44 minutes, 27 seconds.
> [info] Total number of tests run: 2710
> [info] Suites: completed 173, aborted 0
> [info] Tests: succeeded 2709, failed 1, canceled 0, ignored 53, pending 0
> [info] *** 1 TEST FAILED ***
> [error] Failed: Total 2710, Failed 1, Errors 0, Passed 2709, Ignored 53
> [error] Failed tests:
> [error]   org.apache.spark.sql.UserDefinedTypeSuite
> {code}
> {code}
> [info] - Local UDTs *** FAILED *** (110 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
> [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
> [info]   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> [info]   at scala.collection.immutable.List.foreach(List.scala:381)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> [info]   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
> [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
> [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
> [info]   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
> [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
> [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$

[jira] [Updated] (SPARK-18914) Local UDTs test fails due to "ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVector"

2016-12-17 Thread Jacek Laskowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-18914:

Description: 
{code}
[info] ScalaTest
[info] Run completed in 1 hour, 44 minutes, 27 seconds.
[info] Total number of tests run: 2710
[info] Suites: completed 173, aborted 0
[info] Tests: succeeded 2709, failed 1, canceled 0, ignored 53, pending 0
[info] *** 1 TEST FAILED ***
[error] Failed: Total 2710, Failed 1, Errors 0, Passed 2709, Ignored 53
[error] Failed tests:
[error] org.apache.spark.sql.UserDefinedTypeSuite
{code}

{code}
[info] - Local UDTs *** FAILED *** (110 milliseconds)
[info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.UDT$MyDenseVector
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
[info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
[info]   at 
org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
[info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:74)
[info]   at 
org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:74)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
[info]   at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
[info]   at scala.collection.immutable.List.foreach(List.scala:381)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
[info]   at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
[info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
[info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
[info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
[info]   at 
org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at 
org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
[info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
[info]   at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
[info]   at 
org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
[info]   at 
org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
[info]   at 
org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
[info]   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
[info]   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[info]   at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[info]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[info]   at java.lang.Thread.run(Thread.java:745)
{code}

  was:
{code}
[info] - Local UDTs *** FAILED *** (110 milliseconds)
[info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
org.apache.spark.sql.UDT$MyDenseVector
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
[info]   at 
org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest

[jira] [Commented] (SPARK-18914) Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to "ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVecto

2016-12-17 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757061#comment-15757061
 ] 

Sean Owen commented on SPARK-18914:
---

Where does this fail? it doesn't fail on Jenkins builds, evidently.

> Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to 
> "ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector"
> -
>
> Key: SPARK-18914
> URL: https://issues.apache.org/jira/browse/SPARK-18914
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> [info] ScalaTest
> [info] Run completed in 1 hour, 44 minutes, 27 seconds.
> [info] Total number of tests run: 2710
> [info] Suites: completed 173, aborted 0
> [info] Tests: succeeded 2709, failed 1, canceled 0, ignored 53, pending 0
> [info] *** 1 TEST FAILED ***
> [error] Failed: Total 2710, Failed 1, Errors 0, Passed 2709, Ignored 53
> [error] Failed tests:
> [error]   org.apache.spark.sql.UserDefinedTypeSuite
> {code}
> {code}
> [info] - Local UDTs *** FAILED *** (110 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
> [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
> [info]   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> [info]   at scala.collection.immutable.List.foreach(List.scala:381)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> [info]   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
> [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
> [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
> [info]   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
> [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
> [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:480)
> [info]   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
> [info]   at sbt.ForkMain$Run

[jira] [Resolved] (SPARK-18829) Printing to logger

2016-12-17 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18829.
---
Resolution: Not A Problem

I think the existing API provides all this info and the caller can log it as 
desired.

> Printing to logger
> --
>
> Key: SPARK-18829
> URL: https://issues.apache.org/jira/browse/SPARK-18829
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.6.2
> Environment: ALL
>Reporter: David Hodeffi
>Priority: Trivial
>  Labels: easyfix, patch
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I would like to print dataframe.show or  df.explain(true)  into log file.
> right now the code print to standard output without a way to redirect it.
> It also cannot be configured on log4j.properties.
> My suggestion is to write to the logger and standard output.
> i.e 
> class DataFrame {..
> override def explain(extended: Boolean): Unit = {
> val explain = ExplainCommand(queryExecution.logical, extended = extended)
> sqlContext.executePlan(explain).executedPlan.executeCollect().foreach {
>   // scalastyle:off println
>   r => {
> println(r.getString(0))
> logger.debug(r.getString(0))
>   }
>  }
>   // scalastyle:on println
> }
>   }
> def show(numRows: Int, truncate: Boolean): Unit = {
> val str =showString(numRows, truncate) 
> println(str)
> logger.debug(str)
> }
> }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18914) Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to "ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.UDT$MyDenseVecto

2016-12-17 Thread Jacek Laskowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757105#comment-15757105
 ] 

Jacek Laskowski commented on SPARK-18914:
-

That's mysterious indeed as it fails consistently on my laptop with no local 
changes but for ScalaTest 3.1.0 (in https://github.com/apache/spark/pull/16309).

> Local UDTs test (org.apache.spark.sql.UserDefinedTypeSuite) fails due to 
> "ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector"
> -
>
> Key: SPARK-18914
> URL: https://issues.apache.org/jira/browse/SPARK-18914
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> {code}
> [info] ScalaTest
> [info] Run completed in 1 hour, 44 minutes, 27 seconds.
> [info] Total number of tests run: 2710
> [info] Suites: completed 173, aborted 0
> [info] Tests: succeeded 2709, failed 1, canceled 0, ignored 53, pending 0
> [info] *** 1 TEST FAILED ***
> [error] Failed: Total 2710, Failed 1, Errors 0, Passed 2709, Ignored 53
> [error] Failed tests:
> [error]   org.apache.spark.sql.UserDefinedTypeSuite
> {code}
> {code}
> [info] - Local UDTs *** FAILED *** (110 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:135)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$9.apply(UserDefinedTypeSuite.scala:129)
> [info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info]   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
> [info]   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:183)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:196)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> [info]   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:196)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:221)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite.runTest(UserDefinedTypeSuite.scala:74)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:229)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:396)
> [info]   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:384)
> [info]   at scala.collection.immutable.List.foreach(List.scala:381)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
> [info]   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:379)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
> [info]   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:229)
> [info]   at org.scalatest.FunSuite.runTests(FunSuite.scala:1560)
> [info]   at org.scalatest.Suite$class.run(Suite.scala:1147)
> [info]   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1560)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:233)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:521)
> [info]   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:233)
> [info]   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:210)
> [info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:31)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:314)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Frame

[jira] [Commented] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon commented on SPARK-18648:
--

Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use

{{--driver-library-path XXX.jar}} and {{--conf 
spark.executor.extraLibraryPath=XXX.jar}} in `spark-shell` or `spark-shell.cmd` 
for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 




> spark-shell --jars option does not add jars to classpath on windows
> ---
>
> Key: SPARK-18648
> URL: https://issues.apache.org/jira/browse/SPARK-18648
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Windows
>Affects Versions: 2.0.2
> Environment: Windows 7 x64
>Reporter: Michel Lemay
>  Labels: windows
>
> I can't import symbols from command line jars when in the shell:
> Adding jars via --jars:
> {code}
> spark-shell --master local[*] --jars path\to\deeplearning4j-core-0.7.0.jar
> {code}
> Same result if I add it through maven coordinates:
> {code}spark-shell --master local[*] --packages 
> org.deeplearning4j:deeplearning4j-core:0.7.0
> {code}
> I end up with:
> {code}
> scala> import org.deeplearning4j
> :23: error: object deeplearning4j is not a member of package org
>import org.deeplearning4j
> {code}
> NOTE: It is working as expected when running on linux.
> Sample output with --verbose:
> {code}
> Using properties file: null
> Parsed arguments:
>   master  local[*]
>   deployMode  null
>   executorMemory  null
>   executorCores   null
>   totalExecutorCores  null
>   propertiesFile  null
>   driverMemorynull
>   driverCores null
>   driverExtraClassPathnull
>   driverExtraLibraryPath  null
>   driverExtraJavaOptions  null
>   supervise   false
>   queue   null
>   numExecutorsnull
>   files   null
>   pyFiles null
>   archivesnull
>   mainClass   org.apache.spark.repl.Main
>   primaryResource spark-shell
>   nameSpark shell
>   childArgs   []
>   jars
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
>   packagesnull
>   packagesExclusions  null
>   repositoriesnull
>   verbose true
> Spark properties used, including those specified through
>  --conf and those from the properties file null:
> Main class:
> org.apache.spark.repl.Main
> Arguments:
> System properties:
> SPARK_SUBMIT -> true
> spark.app.name -> Spark shell
> spark.jars -> 
> file:/C:/Apps/Spark/spark-2.0.2-bin-hadoop2.4/bin/../deeplearning4j-core-0.7.0.jar
> spark.submit.de

[jira] [Comment Edited] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon edited comment on SPARK-18648 at 12/17/16 3:21 PM:


Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use {{--driver-library-path XXX.jar}} and 
{{--conf spark.executor.extraLibraryPath=XXX.jar}} in {{spark-shell}} or 
{{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 





was (Author: hyukjin.kwon):
Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use

"--driver-library-path XXX.jar" and "--conf 
spark.executor.extraLibraryPath=XXX.jar" in {{spark-shell}} or 
{{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 




> spark-shell --jars option does not add jars to classpath on windows
> ---
>
>

[jira] [Comment Edited] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon edited comment on SPARK-18648 at 12/17/16 3:20 PM:


Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use

"--driver-library-path XXX.jar" and "--conf 
spark.executor.extraLibraryPath=XXX.jar" in {{spark-shell}} or 
{{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 





was (Author: hyukjin.kwon):
Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use

{{--driver-library-path XXX.jar}} and {{--conf 
spark.executor.extraLibraryPath=XXX.jar}} in `spark-shell` or `spark-shell.cmd` 
for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 




> spark-shell --jars option does not add jars to classpath on windows
> ---
>
>

[jira] [Comment Edited] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon edited comment on SPARK-18648 at 12/17/16 3:24 PM:


Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use {{driver-library-path}} and 
{{spark.executor.extraLibraryPath}} in {{spark-shell}} or {{spark-shell.cmd}} 
for sure.

or

{code}
spark.driver.extraLibraryPath 
spark.executor.extraLibraryPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 





was (Author: hyukjin.kwon):
Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use {{--driver-library-path XXX.jar}} and 
{{--conf spark.executor.extraLibraryPath=XXX.jar}} in {{spark-shell}} or 
{{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath 
spark.executor.extraClassPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 




> spark-shell --jars option does not add jars to classpath on windows
> ---
>
> Key: SPARK-18

[jira] [Comment Edited] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon edited comment on SPARK-18648 at 12/17/16 3:29 PM:


Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use options related with path and classpaths 
such as {{driver-library-path}} and {{spark.executor.extraLibraryPath}} in 
{{spark-shell}} or {{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath
spark.driver.extraLibraryPath 
spark.executor.extraClassPath 
spark.executor.extraLibraryPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting options such as {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 





was (Author: hyukjin.kwon):
Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use {{driver-library-path}} and 
{{spark.executor.extraLibraryPath}} in {{spark-shell}} or {{spark-shell.cmd}} 
for sure.

or

{code}
spark.driver.extraLibraryPath 
spark.executor.extraLibraryPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} too (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 




> spark-shell --jars option does not add jars to classpath on windows
> --

[jira] [Comment Edited] (SPARK-18648) spark-shell --jars option does not add jars to classpath on windows

2016-12-17 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757178#comment-15757178
 ] 

Hyukjin Kwon edited comment on SPARK-18648 at 12/17/16 3:45 PM:


Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use options related with path and classpaths 
such as {{driver-library-path}} and {{spark.executor.extraLibraryPath}} in 
{{spark-shell}} or {{spark-shell.cmd}} for sure (I use JNI).

or

{code}
spark.driver.extraClassPath
spark.driver.extraLibraryPath 
spark.executor.extraClassPath 
spark.executor.extraLibraryPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting options such as {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} (or equivalent options).
- The right paths should be URI forms as suggested above.
- I think we should investigate this. 





was (Author: hyukjin.kwon):
Thank you for cc'ing me.

Yes, I think {{textFile}} seems irrelevant. Also, yes, I think the safe choice 
is always to use a URI format on Windows where applicable.
I just manually tested and I can reproduce this on Windows although I see these 
seem parsed fine as below with {{--jars}}.

{code}

C:\spark\bin>spark-shell.cmd --jars 
C:\Users\IEUser\Downloads\spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
  ...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
  ...
{code}

{code}
C:\spark\bin>spark-shell.cmd --jars 
/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar  --verbose
...
  jars
file:/C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code} 

In those cases, It seems I can't import the contents in the jar on Windows 
whereas I can on my Mac if I haven't done something wrong.

Personally, in my case, I always use options related with path and classpaths 
such as {{driver-library-path}} and {{spark.executor.extraLibraryPath}} in 
{{spark-shell}} or {{spark-shell.cmd}} for sure.

or

{code}
spark.driver.extraClassPath
spark.driver.extraLibraryPath 
spark.executor.extraClassPath 
spark.executor.extraLibraryPath 
{code}

in the configuration.

Anyway, I believe the case below {{C:/a/b/c}} is discouraged because I have 
seen this is failed to be parsed properly as a URI ending up with recognising 
C: as the scheme. 

{code}
C:\spark\bin>spark-shell.cmd --jars 
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar --verbose
...
  jars
C:/Users/IEUser/Downloads/spark-redshift_2.11-3.0.0-preview1.jar 
...
{code}


Actually, I noticed such suspicious cases across Spark so I wanted to fix and 
and fix them after making the tests passed on Windows all.


So, to cut this short,

- I can reproduce this on Windows
- Workaround seems setting options such as {{spark.driver.extraClassPath}} and 
{{spark.executor.extraClassPath}} (or equivalent options).
- The right paths should be URI forms as sug

[jira] [Commented] (SPARK-18703) Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not Dropped Until Normal Termination of JVM

2016-12-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757476#comment-15757476
 ] 

Apache Spark commented on SPARK-18703:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16325

> Insertion/CTAS against Hive Tables: Staging Directories and Data Files Not 
> Dropped Until Normal Termination of JVM
> --
>
> Key: SPARK-18703
> URL: https://issues.apache.org/jira/browse/SPARK-18703
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Critical
> Fix For: 2.2.0
>
>
> Below are the files/directories generated for three inserts againsts a Hive 
> table:
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-29_149_4298858301766472202-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_454_6445008511655931341-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/._SUCCESS.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/_SUCCESS
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.hive-staging_hive_2016-12-03_20-56-30_722_3388423608658711001-1/-ext-1/part-0
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/part-0
> {noformat}
> The first 18 files are temporary. We do not drop it until the end of JVM 
> termination. If JVM does not appropriately terminate, these temporary 
> files/directories will not be dropped.
> Only the last two files are needed, as shown below.
> {noformat}
> /private/var/folders/4b/sgmfldk15js406vk7lw5llzwgn/T/spark-41eaa5ce-0288-471e-bba1-09cc482813ff/.part-0.crc
> /private/v

[jira] [Commented] (SPARK-18675) CTAS for hive serde table should work for all hive versions

2016-12-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757477#comment-15757477
 ] 

Apache Spark commented on SPARK-18675:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16325

> CTAS for hive serde table should work for all hive versions
> ---
>
> Key: SPARK-18675
> URL: https://issues.apache.org/jira/browse/SPARK-18675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18808) ml.KMeansModel.transform is very inefficient

2016-12-17 Thread yuhao yang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757839#comment-15757839
 ] 

yuhao yang commented on SPARK-18808:


[~FlamingMike] Are you interested in sending a fix?

> ml.KMeansModel.transform is very inefficient
> 
>
> Key: SPARK-18808
> URL: https://issues.apache.org/jira/browse/SPARK-18808
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Michel Lemay
>
> The function ml.KMeansModel.transform will call the 
> parentModel.predict(features) method on each row which in turns will 
> normalize all clusterCenters from mllib.KMeansModel.clusterCentersWithNorm 
> every time!
> This is a serious waste of resources!  In my profiling, 
> clusterCentersWithNorm represent 99% of the sampling!  
> This should have been implemented with a broadcast variable as it is done in 
> other functions like computeCost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-17 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757924#comment-15757924
 ] 

Dongjoon Hyun commented on SPARK-18877:
---

Hi, [~Navya Krishnappa].

As you see in the stack trace, that is a different exception from Apache 
Parquet code.
{code}
Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
{code}

You can register another issue for that, but maybe in Apache Parquet JIRA.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-17 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757940#comment-15757940
 ] 

Dongjoon Hyun commented on SPARK-18877:
---

For a Spark usage workaround, I think you can change the schema into a 
Parquet-acceptable one manually.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18877) Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 exceeds max precision 20

2016-12-17 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757924#comment-15757924
 ] 

Dongjoon Hyun edited comment on SPARK-18877 at 12/18/16 1:11 AM:
-

Hi, [~Navya Krishnappa].

As you see in the stack trace, that is a different exception from Apache 
Parquet code.
{code}
Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
{code}

And, this is the Parquet code 
[here|https://github.com/Parquet/parquet-mr/blob/master/parquet-column/src/main/java/parquet/schema/Types.java#L405-L417].

It apparently does not fully support BigDecimal.

{code}
protected DecimalMetadata decimalMetadata() {
  DecimalMetadata meta = null;
  if (OriginalType.DECIMAL == originalType) {
Preconditions.checkArgument(precision > 0,
"Invalid DECIMAL precision: " + precision);
Preconditions.checkArgument(scale >= 0,
"Invalid DECIMAL scale: " + scale);
Preconditions.checkArgument(scale <= precision,
"Invalid DECIMAL scale: cannot be greater than precision");
meta = new DecimalMetadata(precision, scale);
  }
  return meta;
}
  }
{code}

I cannot make a PR for Parquet.
I hope you register another issue for that in Apache Parquet JIRA.


was (Author: dongjoon):
Hi, [~Navya Krishnappa].

As you see in the stack trace, that is a different exception from Apache 
Parquet code.
{code}
Caused by: java.lang.IllegalArgumentException: Invalid DECIMAL scale: -9
at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
{code}

You can register another issue for that, but maybe in Apache Parquet JIRA.

> Unable to read given csv data. Excepion: java.lang.IllegalArgumentException: 
> requirement failed: Decimal precision 28 exceeds max precision 20
> --
>
> Key: SPARK-18877
> URL: https://issues.apache.org/jira/browse/SPARK-18877
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Navya Krishnappa
>
> When reading below mentioned csv data, even though the maximum decimal 
> precision is 38, following exception is thrown 
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 28 
> exceeds max precision 20
> Decimal
> 2323366225312000
> 2433573971400
> 23233662253000
> 23233662253



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18817) Ensure nothing is written outside R's tempdir() by default

2016-12-17 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757998#comment-15757998
 ] 

Felix Cheung commented on SPARK-18817:
--

Aside from changing the existing shipped behavior, there are a few mentions of 
this behavior in various documentation that would become wrong and would need 
to be updated.

IMO more importantly we still have a feature that can be turned on (as 
documented or suggested in documentations) that would cause files to be written 
without the user explicitly agreeing to it (or understanding it). This to me 
doesn't seem like we would be addressing the root of the issue fully, merely 
side-stepping it?

I've managed to track down the fix to move metastore_db and derby.log though. 
There are two separate switches to set that it doable from pure R; but I'd 
recommend doing in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116
  in order to respect any existing value from hive-site.xml if given one. 

How about we introduce something like spark.sql.default.derby.dir and fix this 
that way?

> Ensure nothing is written outside R's tempdir() by default
> --
>
> Key: SPARK-18817
> URL: https://issues.apache.org/jira/browse/SPARK-18817
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Brendan Dwyer
>Priority: Critical
>
> Per CRAN policies
> https://cran.r-project.org/web/packages/policies.html
> {quote}
> - Packages should not write in the users’ home filespace, nor anywhere else 
> on the file system apart from the R session’s temporary directory (or during 
> installation in the location pointed to by TMPDIR: and such usage should be 
> cleaned up). Installing into the system’s R installation (e.g., scripts to 
> its bin directory) is not allowed.
> Limited exceptions may be allowed in interactive sessions if the package 
> obtains confirmation from the user.
> - Packages should not modify the global environment (user’s workspace).
> {quote}
> Currently "spark-warehouse" gets created in the working directory when 
> sparkR.session() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-18817) Ensure nothing is written outside R's tempdir() by default

2016-12-17 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15757998#comment-15757998
 ] 

Felix Cheung edited comment on SPARK-18817 at 12/18/16 2:04 AM:


Aside from changing the existing shipped behavior, there are a few mentions of 
this behavior in various documentation that would become wrong and would need 
to be updated.

IMO more importantly we still have a feature that can be turned on (as 
documented or suggested in documentations) that would cause files to be written 
without the user explicitly agreeing to it (or understanding it). This to me 
doesn't seem like we would be addressing the root of the issue fully, merely 
side-stepping it?

I've managed to track down the fix to move metastore_db and derby.log though. 
There are two separate switches to set and it is doable from pure R (have 
tested that); but I'd recommend doing in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116
  in order to respect any existing value from hive-site.xml if given one. 

How about we introduce something like spark.sql.default.derby.dir and fix this 
that way?


was (Author: felixcheung):
Aside from changing the existing shipped behavior, there are a few mentions of 
this behavior in various documentation that would become wrong and would need 
to be updated.

IMO more importantly we still have a feature that can be turned on (as 
documented or suggested in documentations) that would cause files to be written 
without the user explicitly agreeing to it (or understanding it). This to me 
doesn't seem like we would be addressing the root of the issue fully, merely 
side-stepping it?

I've managed to track down the fix to move metastore_db and derby.log though. 
There are two separate switches to set that it doable from pure R; but I'd 
recommend doing in 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L116
  in order to respect any existing value from hive-site.xml if given one. 

How about we introduce something like spark.sql.default.derby.dir and fix this 
that way?

> Ensure nothing is written outside R's tempdir() by default
> --
>
> Key: SPARK-18817
> URL: https://issues.apache.org/jira/browse/SPARK-18817
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Brendan Dwyer
>Priority: Critical
>
> Per CRAN policies
> https://cran.r-project.org/web/packages/policies.html
> {quote}
> - Packages should not write in the users’ home filespace, nor anywhere else 
> on the file system apart from the R session’s temporary directory (or during 
> installation in the location pointed to by TMPDIR: and such usage should be 
> cleaned up). Installing into the system’s R installation (e.g., scripts to 
> its bin directory) is not allowed.
> Limited exceptions may be allowed in interactive sessions if the package 
> obtains confirmation from the user.
> - Packages should not modify the global environment (user’s workspace).
> {quote}
> Currently "spark-warehouse" gets created in the working directory when 
> sparkR.session() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18817) Ensure nothing is written outside R's tempdir() by default

2016-12-17 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758005#comment-15758005
 ] 

Felix Cheung commented on SPARK-18817:
--

And as a side note, I feel like spark-warehouse, metastore_db, and derby.log 
should all be in temporary directory that is cleaned out when the app is done, 
much like how Spark Thrift Server is doing currently (at least to metastore_db, 
derby.log). Perhaps that is the more correct fix longer term.

> Ensure nothing is written outside R's tempdir() by default
> --
>
> Key: SPARK-18817
> URL: https://issues.apache.org/jira/browse/SPARK-18817
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Brendan Dwyer
>Priority: Critical
>
> Per CRAN policies
> https://cran.r-project.org/web/packages/policies.html
> {quote}
> - Packages should not write in the users’ home filespace, nor anywhere else 
> on the file system apart from the R session’s temporary directory (or during 
> installation in the location pointed to by TMPDIR: and such usage should be 
> cleaned up). Installing into the system’s R installation (e.g., scripts to 
> its bin directory) is not allowed.
> Limited exceptions may be allowed in interactive sessions if the package 
> obtains confirmation from the user.
> - Packages should not modify the global environment (user’s workspace).
> {quote}
> Currently "spark-warehouse" gets created in the working directory when 
> sparkR.session() is called.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-18915) Return Nothing when Querying a Partitioned Data Source Table without Repairing it

2016-12-17 Thread Xiao Li (JIRA)

Xiao Li created SPARK-18915:
---

 Summary: Return Nothing when Querying a Partitioned Data Source 
Table without Repairing it
 Key: SPARK-18915
 URL: https://issues.apache.org/jira/browse/SPARK-18915
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: Xiao Li
Priority: Critical


In Spark 2.1, if we create a parititoned data source table given a specified 
path, it returns nothing when we try to query it. To get the data, we have to 
manually issue a DDL to repair the table. 

In Spark 2.0, it can return the data stored in the specified path, without 
repairing the table.

Below is the output of Spark 2.1. 

{noformat}
scala> spark.range(5).selectExpr("id as fieldOne", "id as 
partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test")
[Stage 0:==>(3 + 5) / 
8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> spark.sql("select * from test").show()
++---+
|fieldOne|partCol|
++---+
|   0|  0|
|   1|  1|
|   2|  2|
|   3|  3|
|   4|  4|
++---+


scala> spark.sql("desc formatted test").show(50, false)
++--+---+
|col_name|data_type 
|comment|
++--+---+
|fieldOne|bigint
|null   |
|partCol |bigint
|null   |
|# Partition Information |  
|   |
|# col_name  |data_type 
|comment|
|partCol |bigint
|null   |
||  
|   |
|# Detailed Table Information|  
|   |
|Database:   |default   
|   |
|Owner:  |xiaoli
|   |
|Create Time:|Sat Dec 17 17:46:24 PST 2016  
|   |
|Last Access Time:   |Wed Dec 31 16:00:00 PST 1969  
|   |
|Location:   
|file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test|   |
|Table Type: |MANAGED   
|   |
|Table Parameters:   |  
|   |
|  transient_lastDdlTime |1482025584
|   |
||  
|   |
|# Storage Information   |  
|   |
|SerDe Library:  
|org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe   |   |
|InputFormat:
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |   |
|OutputFormat:   
|org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat|   |
|Compressed: |No
|   |
|Storage Desc Parameters:|  
|   |
|  serialization.format  |1 
|   |
|Partition Provider: |Catalog   
|   |
++--+---+


scala> spark.sql(s"create table newTab (fieldOne long, partCol int) using 
parquet options (path 
'file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test') 
partitioned by (partCol)")
res3: org.apache.spark.sql.DataFrame = []

scala> spark.table("newTab").show()
++---+
|fieldOne|partCol|
+

[jira] [Created] (SPARK-18916) Bug in Pregel / mergeMsg with hashmaps

2016-12-17 Thread Seth Bromberger (JIRA)

Seth Bromberger created SPARK-18916:
---

 Summary: Bug in Pregel / mergeMsg with hashmaps
 Key: SPARK-18916
 URL: https://issues.apache.org/jira/browse/SPARK-18916
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 2.0.2
 Environment: OSX / IntelliJ IDEA 2016.3 CE EAP, Scala 2.11.8, Spark 
2.0.2
Reporter: Seth Bromberger


Consider the following (rough) code that attempts to calculate all-pairs 
shortest paths via pregel:

{code:scala}
def allPairsShortestPaths: RDD[(VertexId, HashMap[VertexId, ParentDist])] = 
{
  val initialMsg = HashMap(-1L -> ParentDist(-1L, -1L))
  val pregelg = g.mapVertices((vid, vd) => (vd, HashMap[VertexId, 
ParentDist](vid -> ParentDist(vid, 0L.reverse

  def vprog(v: VertexId, value: (VD, HashMap[VertexId, ParentDist]), 
message: HashMap[VertexId, ParentDist]): (VD, HashMap[VertexId, ParentDist]) = {
val updatedValues = mm2(value._2, message).filter(v => v._2.dist >= 0)
(value._1, updatedValues)
  }

  def sendMsg(triplet: EdgeTriplet[(VD, HashMap[VertexId, ParentDist]), 
ED]): Iterator[(VertexId, HashMap[VertexId, ParentDist])] = {
val dstVertexId = triplet.dstId
val srcMap = triplet.srcAttr._2
val dstMap = triplet.dstAttr._2  // guaranteed to have dstVertexId as a 
key

val updatesToSend : HashMap[VertexId, ParentDist] = srcMap.filter {
  case (vid, srcPD) => dstMap.get(vid) match {
case Some(dstPD) => dstPD.dist > srcPD.dist + 1   // if it exists, 
is it cheaper?
case _ => true // not found - new update
  }
}.map(u => u._1 -> ParentDist(triplet.srcId, u._2.dist +1))


if (updatesToSend.nonEmpty)
  Iterator[(VertexId, HashMap[VertexId, ParentDist])]((dstVertexId, 
updatesToSend))
else
  Iterator.empty
  }

  def mergeMsg(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
ParentDist]): HashMap[VertexId, ParentDist] = {

// when the following two lines are commented out, the program fails 
with
// 16/12/17 19:53:50 INFO DAGScheduler: Job 24 failed: reduce at 
VertexRDDImpl.scala:88, took 0.244042 s
// Exception in thread "main" org.apache.spark.SparkException: Job 
aborted due to stage failure: Task 0 in stage 1099.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 1099.0 (TID 129, localhost): 
scala.MatchError: (null,null) (of class scala.Tuple2)

m1.foreach(_ => ())
m2.foreach(_ => ())

m1.merged(m2) {
  case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
}
  }

  // mm2 is here just to provide a separate function for vprog. Ideally 
we'd just re-use mergeMsg.
  def mm2(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
ParentDist]): HashMap[VertexId, ParentDist] = {
m1.merged(m2) {
  case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
  case n => throw new Exception("we've got a problem: " + n)
}
  }

  val pregelRun = pregelg.pregel(initialMsg)(vprog, sendMsg, mergeMsg)
  val sps = pregelRun.vertices.map(v => v._1 -> v._2._2)
  sps
}
  }
{code}

Note the comment in the mergeMsg function: when the messages are explicitly 
accessed prior to the .merged statement, the code works. If these side-effect 
statements are removed / commented out, the error message in the comments is 
generated.

This fails consistently on a 50-node undirected cyclegraph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18916) Bug in Pregel / mergeMsg with hashmaps

2016-12-17 Thread Seth Bromberger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Bromberger updated SPARK-18916:

Description: 
Consider the following (rough) code that attempts to calculate all-pairs 
shortest paths via pregel:

{code:java}
def allPairsShortestPaths: RDD[(VertexId, HashMap[VertexId, ParentDist])] = 
{
  val initialMsg = HashMap(-1L -> ParentDist(-1L, -1L))
  val pregelg = g.mapVertices((vid, vd) => (vd, HashMap[VertexId, 
ParentDist](vid -> ParentDist(vid, 0L.reverse

  def vprog(v: VertexId, value: (VD, HashMap[VertexId, ParentDist]), 
message: HashMap[VertexId, ParentDist]): (VD, HashMap[VertexId, ParentDist]) = {
val updatedValues = mm2(value._2, message).filter(v => v._2.dist >= 0)
(value._1, updatedValues)
  }

  def sendMsg(triplet: EdgeTriplet[(VD, HashMap[VertexId, ParentDist]), 
ED]): Iterator[(VertexId, HashMap[VertexId, ParentDist])] = {
val dstVertexId = triplet.dstId
val srcMap = triplet.srcAttr._2
val dstMap = triplet.dstAttr._2  // guaranteed to have dstVertexId as a 
key

val updatesToSend : HashMap[VertexId, ParentDist] = srcMap.filter {
  case (vid, srcPD) => dstMap.get(vid) match {
case Some(dstPD) => dstPD.dist > srcPD.dist + 1   // if it exists, 
is it cheaper?
case _ => true // not found - new update
  }
}.map(u => u._1 -> ParentDist(triplet.srcId, u._2.dist +1))


if (updatesToSend.nonEmpty)
  Iterator[(VertexId, HashMap[VertexId, ParentDist])]((dstVertexId, 
updatesToSend))
else
  Iterator.empty
  }

  def mergeMsg(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
ParentDist]): HashMap[VertexId, ParentDist] = {

// when the following two lines are commented out, the program fails 
with
// 16/12/17 19:53:50 INFO DAGScheduler: Job 24 failed: reduce at 
VertexRDDImpl.scala:88, took 0.244042 s
// Exception in thread "main" org.apache.spark.SparkException: Job 
aborted due to stage failure: Task 0 in stage 1099.0 failed 1 times, most 
recent failure: Lost task 0.0 in stage 1099.0 (TID 129, localhost): 
scala.MatchError: (null,null) (of class scala.Tuple2)

m1.foreach(_ => ())
m2.foreach(_ => ())

m1.merged(m2) {
  case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
}
  }

  // mm2 is here just to provide a separate function for vprog. Ideally 
we'd just re-use mergeMsg.
  def mm2(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
ParentDist]): HashMap[VertexId, ParentDist] = {
m1.merged(m2) {
  case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
  case n => throw new Exception("we've got a problem: " + n)
}
  }

  val pregelRun = pregelg.pregel(initialMsg)(vprog, sendMsg, mergeMsg)
  val sps = pregelRun.vertices.map(v => v._1 -> v._2._2)
  sps
}
  }
{code}

Note the comment in the mergeMsg function: when the messages are explicitly 
accessed prior to the .merged statement, the code works. If these side-effect 
statements are removed / commented out, the error message in the comments is 
generated.

This fails consistently on a 50-node undirected cyclegraph.

  was:
Consider the following (rough) code that attempts to calculate all-pairs 
shortest paths via pregel:

{code:scala}
def allPairsShortestPaths: RDD[(VertexId, HashMap[VertexId, ParentDist])] = 
{
  val initialMsg = HashMap(-1L -> ParentDist(-1L, -1L))
  val pregelg = g.mapVertices((vid, vd) => (vd, HashMap[VertexId, 
ParentDist](vid -> ParentDist(vid, 0L.reverse

  def vprog(v: VertexId, value: (VD, HashMap[VertexId, ParentDist]), 
message: HashMap[VertexId, ParentDist]): (VD, HashMap[VertexId, ParentDist]) = {
val updatedValues = mm2(value._2, message).filter(v => v._2.dist >= 0)
(value._1, updatedValues)
  }

  def sendMsg(triplet: EdgeTriplet[(VD, HashMap[VertexId, ParentDist]), 
ED]): Iterator[(VertexId, HashMap[VertexId, ParentDist])] = {
val dstVertexId = triplet.dstId
val srcMap = triplet.srcAttr._2
val dstMap = triplet.dstAttr._2  // guaranteed to have dstVertexId as a 
key

val updatesToSend : HashMap[VertexId, ParentDist] = srcMap.filter {
  case (vid, srcPD) => dstMap.get(vid) match {
case Some(dstPD) => dstPD.dist > srcPD.dist + 1   // if it exists, 
is it cheaper?
case _ => true // not found - new update
  }
}.map(u => u._1 -> ParentDist(triplet.srcId, u._2.dist +1))


if (updatesToSend.nonEmpty)
  Iterator[(VertexId, HashMap[VertexId, ParentDist])]((dstVertexId, 
updatesToSend))
else
  Iterator.empty
  }

  def mergeMsg(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
ParentDist]): HashMap[VertexId, ParentDist] = {

// when the following two

[jira] [Updated] (SPARK-18916) Possible bug in Pregel / mergeMsg with hashmaps

2016-12-17 Thread Seth Bromberger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seth Bromberger updated SPARK-18916:

Summary: Possible bug in Pregel / mergeMsg with hashmaps  (was: Bug in 
Pregel / mergeMsg with hashmaps)

> Possible bug in Pregel / mergeMsg with hashmaps
> ---
>
> Key: SPARK-18916
> URL: https://issues.apache.org/jira/browse/SPARK-18916
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.0.2
> Environment: OSX / IntelliJ IDEA 2016.3 CE EAP, Scala 2.11.8, Spark 
> 2.0.2
>Reporter: Seth Bromberger
>  Labels: error, graphx, pregel
>
> Consider the following (rough) code that attempts to calculate all-pairs 
> shortest paths via pregel:
> {code:java}
> def allPairsShortestPaths: RDD[(VertexId, HashMap[VertexId, ParentDist])] 
> = {
>   val initialMsg = HashMap(-1L -> ParentDist(-1L, -1L))
>   val pregelg = g.mapVertices((vid, vd) => (vd, HashMap[VertexId, 
> ParentDist](vid -> ParentDist(vid, 0L.reverse
>   def vprog(v: VertexId, value: (VD, HashMap[VertexId, ParentDist]), 
> message: HashMap[VertexId, ParentDist]): (VD, HashMap[VertexId, ParentDist]) 
> = {
> val updatedValues = mm2(value._2, message).filter(v => v._2.dist >= 0)
> (value._1, updatedValues)
>   }
>   def sendMsg(triplet: EdgeTriplet[(VD, HashMap[VertexId, ParentDist]), 
> ED]): Iterator[(VertexId, HashMap[VertexId, ParentDist])] = {
> val dstVertexId = triplet.dstId
> val srcMap = triplet.srcAttr._2
> val dstMap = triplet.dstAttr._2  // guaranteed to have dstVertexId as 
> a key
> val updatesToSend : HashMap[VertexId, ParentDist] = srcMap.filter {
>   case (vid, srcPD) => dstMap.get(vid) match {
> case Some(dstPD) => dstPD.dist > srcPD.dist + 1   // if it 
> exists, is it cheaper?
> case _ => true // not found - new update
>   }
> }.map(u => u._1 -> ParentDist(triplet.srcId, u._2.dist +1))
> if (updatesToSend.nonEmpty)
>   Iterator[(VertexId, HashMap[VertexId, ParentDist])]((dstVertexId, 
> updatesToSend))
> else
>   Iterator.empty
>   }
>   def mergeMsg(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
> ParentDist]): HashMap[VertexId, ParentDist] = {
> // when the following two lines are commented out, the program fails 
> with
> // 16/12/17 19:53:50 INFO DAGScheduler: Job 24 failed: reduce at 
> VertexRDDImpl.scala:88, took 0.244042 s
> // Exception in thread "main" org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 0 in stage 1099.0 failed 1 times, most 
> recent failure: Lost task 0.0 in stage 1099.0 (TID 129, localhost): 
> scala.MatchError: (null,null) (of class scala.Tuple2)
> m1.foreach(_ => ())
> m2.foreach(_ => ())
> m1.merged(m2) {
>   case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
> }
>   }
>   // mm2 is here just to provide a separate function for vprog. Ideally 
> we'd just re-use mergeMsg.
>   def mm2(m1: HashMap[VertexId, ParentDist], m2: HashMap[VertexId, 
> ParentDist]): HashMap[VertexId, ParentDist] = {
> m1.merged(m2) {
>   case ((k1, v1), (_, v2)) => (k1, v1.min(v2))
>   case n => throw new Exception("we've got a problem: " + n)
> }
>   }
>   val pregelRun = pregelg.pregel(initialMsg)(vprog, sendMsg, mergeMsg)
>   val sps = pregelRun.vertices.map(v => v._1 -> v._2._2)
>   sps
> }
>   }
> {code}
> Note the comment in the mergeMsg function: when the messages are explicitly 
> accessed prior to the .merged statement, the code works. If these side-effect 
> statements are removed / commented out, the error message in the comments is 
> generated.
> This fails consistently on a 50-node undirected cyclegraph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18915) Return Nothing when Querying a Partitioned Data Source Table without Repairing it

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18915:


Assignee: Apache Spark

> Return Nothing when Querying a Partitioned Data Source Table without 
> Repairing it
> -
>
> Key: SPARK-18915
> URL: https://issues.apache.org/jira/browse/SPARK-18915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Critical
>
> In Spark 2.1, if we create a parititoned data source table given a specified 
> path, it returns nothing when we try to query it. To get the data, we have to 
> manually issue a DDL to repair the table. 
> In Spark 2.0, it can return the data stored in the specified path, without 
> repairing the table.
> Below is the output of Spark 2.1. 
> {noformat}
> scala> spark.range(5).selectExpr("id as fieldOne", "id as 
> partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test")
> [Stage 0:==>(3 + 5) / 
> 8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
>   
>   
> scala> spark.sql("select * from test").show()
> ++---+
> |fieldOne|partCol|
> ++---+
> |   0|  0|
> |   1|  1|
> |   2|  2|
> |   3|  3|
> |   4|  4|
> ++---+
> scala> spark.sql("desc formatted test").show(50, false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |fieldOne|bigint  
>   |null   |
> |partCol |bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |partCol |bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database:   |default 
>   |   |
> |Owner:  |xiaoli  
>   |   |
> |Create Time:|Sat Dec 17 17:46:24 PST 2016
>   |   |
> |Last Access Time:   |Wed Dec 31 16:00:00 PST 1969
>   |   |
> |Location:   
> |file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test|  
>  |
> |Table Type: |MANAGED 
>   |   |
> |Table Parameters:   |
>   |   |
> |  transient_lastDdlTime |1482025584  
>   |   |
> ||
>   |   |
> |# Storage Information   |
>   |   |
> |SerDe Library:  
> |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe   |  
>  |
> |InputFormat:
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |  
>  |
> |OutputFormat:   
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat|  
>  |
> |Compressed: |No  
>   |   |
> |Storage Desc Parameters:|
>   |   |
> |  serialization.format  |1   
>   |   |
> |Partition Provider: |Catalog 
>

[jira] [Commented] (SPARK-18915) Return Nothing when Querying a Partitioned Data Source Table without Repairing it

2016-12-17 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15758144#comment-15758144
 ] 

Apache Spark commented on SPARK-18915:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16326

> Return Nothing when Querying a Partitioned Data Source Table without 
> Repairing it
> -
>
> Key: SPARK-18915
> URL: https://issues.apache.org/jira/browse/SPARK-18915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Priority: Critical
>
> In Spark 2.1, if we create a parititoned data source table given a specified 
> path, it returns nothing when we try to query it. To get the data, we have to 
> manually issue a DDL to repair the table. 
> In Spark 2.0, it can return the data stored in the specified path, without 
> repairing the table.
> Below is the output of Spark 2.1. 
> {noformat}
> scala> spark.range(5).selectExpr("id as fieldOne", "id as 
> partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test")
> [Stage 0:==>(3 + 5) / 
> 8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
>   
>   
> scala> spark.sql("select * from test").show()
> ++---+
> |fieldOne|partCol|
> ++---+
> |   0|  0|
> |   1|  1|
> |   2|  2|
> |   3|  3|
> |   4|  4|
> ++---+
> scala> spark.sql("desc formatted test").show(50, false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |fieldOne|bigint  
>   |null   |
> |partCol |bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |partCol |bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database:   |default 
>   |   |
> |Owner:  |xiaoli  
>   |   |
> |Create Time:|Sat Dec 17 17:46:24 PST 2016
>   |   |
> |Last Access Time:   |Wed Dec 31 16:00:00 PST 1969
>   |   |
> |Location:   
> |file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test|  
>  |
> |Table Type: |MANAGED 
>   |   |
> |Table Parameters:   |
>   |   |
> |  transient_lastDdlTime |1482025584  
>   |   |
> ||
>   |   |
> |# Storage Information   |
>   |   |
> |SerDe Library:  
> |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe   |  
>  |
> |InputFormat:
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |  
>  |
> |OutputFormat:   
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat|  
>  |
> |Compressed: |No  
>   |   |
> |Storage Desc Parameters:|
>   |   |
> |  serialization.format  |1   
>   |   |
> |Pa

[jira] [Assigned] (SPARK-18915) Return Nothing when Querying a Partitioned Data Source Table without Repairing it

2016-12-17 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18915:


Assignee: (was: Apache Spark)

> Return Nothing when Querying a Partitioned Data Source Table without 
> Repairing it
> -
>
> Key: SPARK-18915
> URL: https://issues.apache.org/jira/browse/SPARK-18915
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Priority: Critical
>
> In Spark 2.1, if we create a parititoned data source table given a specified 
> path, it returns nothing when we try to query it. To get the data, we have to 
> manually issue a DDL to repair the table. 
> In Spark 2.0, it can return the data stored in the specified path, without 
> repairing the table.
> Below is the output of Spark 2.1. 
> {noformat}
> scala> spark.range(5).selectExpr("id as fieldOne", "id as 
> partCol").write.partitionBy("partCol").mode("overwrite").saveAsTable("test")
> [Stage 0:==>(3 + 5) / 
> 8]SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
>   
>   
> scala> spark.sql("select * from test").show()
> ++---+
> |fieldOne|partCol|
> ++---+
> |   0|  0|
> |   1|  1|
> |   2|  2|
> |   3|  3|
> |   4|  4|
> ++---+
> scala> spark.sql("desc formatted test").show(50, false)
> ++--+---+
> |col_name|data_type   
>   |comment|
> ++--+---+
> |fieldOne|bigint  
>   |null   |
> |partCol |bigint  
>   |null   |
> |# Partition Information |
>   |   |
> |# col_name  |data_type   
>   |comment|
> |partCol |bigint  
>   |null   |
> ||
>   |   |
> |# Detailed Table Information|
>   |   |
> |Database:   |default 
>   |   |
> |Owner:  |xiaoli  
>   |   |
> |Create Time:|Sat Dec 17 17:46:24 PST 2016
>   |   |
> |Last Access Time:   |Wed Dec 31 16:00:00 PST 1969
>   |   |
> |Location:   
> |file:/Users/xiaoli/IdeaProjects/sparkDelivery/bin/spark-warehouse/test|  
>  |
> |Table Type: |MANAGED 
>   |   |
> |Table Parameters:   |
>   |   |
> |  transient_lastDdlTime |1482025584  
>   |   |
> ||
>   |   |
> |# Storage Information   |
>   |   |
> |SerDe Library:  
> |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe   |  
>  |
> |InputFormat:
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat |  
>  |
> |OutputFormat:   
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat|  
>  |
> |Compressed: |No  
>   |   |
> |Storage Desc Parameters:|
>   |   |
> |  serialization.format  |1   
>   |   |
> |Partition Provider: |Catalog 
>   |   |
> +

[jira] [Created] (SPARK-18917) Dataframe - Time Out Issues / Taking long time in append mode on object stores

2016-12-17 Thread Anbu Cheeralan (JIRA)

Anbu Cheeralan created SPARK-18917:
--

 Summary: Dataframe - Time Out Issues / Taking long time in append 
mode on object stores
 Key: SPARK-18917
 URL: https://issues.apache.org/jira/browse/SPARK-18917
 Project: Spark
  Issue Type: Improvement
  Components: EC2, SQL, YARN
Affects Versions: 2.0.2
Reporter: Anbu Cheeralan
Priority: Minor
 Fix For: 2.1.0, 2.1.1


When using Dataframe write in append mode on object stores (S3 / Google 
Storage), the writes are taking long time to write/ getting read time out. This 
is because dataframe.write lists all leaf folders in the target directory. If 
there are lot of subfolders due to partitions, this is taking for ever.

The code is In org.apache.spark.sql.execution.datasources.DataSource.write() 
following code causes huge number of RPC calls when the file system is an 
Object Store (S3, GS).
if (mode == SaveMode.Append) {
val existingPartitionColumns = Try {
resolveRelation()
.asInstanceOf[HadoopFsRelation]
.location
.partitionSpec()
.partitionColumns
.fieldNames
.toSeq
}.getOrElse(Seq.empty[String])
There should be a flag to skip Partition Match Check in append mode. I can work 
on the patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

47 matches

Mail list logo