[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182214#comment-15182214
 ] 

Santiago M. Mola commented on SPARK-13701:
--

Installed gfortran. Now it fails on NLSSuite, then ALSSuite succeeds.

{code}
[info] NNLSSuite:
[info] Exception encountered when attempting to run a suite with class name: 
org.apache.spark.mllib.optimization.NNLSSuite *** ABORTED *** (68 milliseconds)
[info]   java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
[info]   at org.jblas.NativeBlas.dgemm(Native Method)
[info]   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
[info]   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
[info]   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite.genOnesData(NNLSSuite.scala:33)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(NNLSSuite.scala:56)
[info]   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply$mcV$sp(NNLSSuite.scala:55)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45)
[info]   at 
org.apache.spark.mllib.optimization.NNLSSuite$$anonfun$2.apply(NNLSSuite.scala:45)
{code}

> MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm))
> --
>
> Key: SPARK-13701
> URL: https://issues.apache.org/jira/browse/SPARK-13701
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
> Environment: Ubuntu 14.04 on aarch64
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> jblas fails on arm64.
> {code}
> ALSSuite:
> Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
> milliseconds)
>   java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
>   at org.jblas.NativeBlas.dgemm(Native Method)
>   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
>   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
>   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
>   at 
> org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-05 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181921#comment-15181921
 ] 

Santiago M. Mola commented on SPARK-13701:
--

This is probably just gfortran not being installed? I'll test as soon as 
possible.

> MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm))
> --
>
> Key: SPARK-13701
> URL: https://issues.apache.org/jira/browse/SPARK-13701
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
> Environment: Ubuntu 14.04 on aarch64
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> jblas fails on arm64.
> {code}
> ALSSuite:
> Exception encountered when attempting to run a suite with class name: 
> org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
> milliseconds)
>   java.lang.UnsatisfiedLinkError: 
> org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
>   at org.jblas.NativeBlas.dgemm(Native Method)
>   at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
>   at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
>   at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
>   at 
> org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13701) MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: org.jblas.NativeBlas.dgemm))

2016-03-05 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-13701:


 Summary: MLlib ALS fails on arm64 (java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm))
 Key: SPARK-13701
 URL: https://issues.apache.org/jira/browse/SPARK-13701
 Project: Spark
  Issue Type: Bug
  Components: MLlib
 Environment: Ubuntu 14.04 on aarch64
Reporter: Santiago M. Mola
Priority: Minor


jblas fails on arm64.

{code}
ALSSuite:
Exception encountered when attempting to run a suite with class name: 
org.apache.spark.mllib.recommendation.ALSSuite *** ABORTED *** (112 
milliseconds)
  java.lang.UnsatisfiedLinkError: 
org.jblas.NativeBlas.dgemm(CCIIID[DII[DIID[DII)V
  at org.jblas.NativeBlas.dgemm(Native Method)
  at org.jblas.SimpleBlas.gemm(SimpleBlas.java:247)
  at org.jblas.DoubleMatrix.mmuli(DoubleMatrix.java:1781)
  at org.jblas.DoubleMatrix.mmul(DoubleMatrix.java:3138)
  at 
org.apache.spark.mllib.recommendation.ALSSuite$.generateRatings(ALSSuite.scala:74)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)

2016-03-04 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15181362#comment-15181362
 ] 

Santiago M. Mola commented on SPARK-13690:
--

snappy-java does not have any fallback, but snappy seems to work on arm64 
correctly. I submitted a PR for snappy-java, so a future version should have 
support. This issue will have to wait until such version is out.

I don't expect active support for arm64, but given the latest developments on 
arm64 servers, I'm interested in experimenting with it. It seems I'm not the 
first one to think about it: http://www.sparkonarm.com/ ;-)

> UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is 
> found)
> -
>
> Key: SPARK-13690
> URL: https://issues.apache.org/jira/browse/SPARK-13690
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
> Environment: $ java -version
> java version "1.8.0_73"
> Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)
> $ uname -a
> Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 
> aarch64 aarch64 aarch64 GNU/Linux
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: arm64, porting
>
> UnsafeShuffleWriterSuite fails because of missing Snappy native library on 
> arm64.
> {code}
> Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec 
> <<< FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite
> mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
>   Time elapsed: 0.072 sec  <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: 
> [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux 
> and os.arch=aarch64
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no 
> native library is found for os.name=Linux and os.arch=aarch64
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
> mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
>   Time elapsed: 0.041 sec  <<< ERROR!
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Caused by: java.lang.IllegalArgumentException: 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
> Running org.apache.spark.JavaAPISuite
> Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - 
> in org.apache.spark.JavaAPISuite
> Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - 
> in org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
> Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
> Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - 
> in org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
> Running org.apache.spark.api.java.OptionalSuite
> Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - 
> in org.apache.spark.api.java.OptionalSu

[jira] [Created] (SPARK-13690) UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no native library is found)

2016-03-04 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-13690:


 Summary: UnsafeShuffleWriterSuite fails on arm64 (SnappyError, no 
native library is found)
 Key: SPARK-13690
 URL: https://issues.apache.org/jira/browse/SPARK-13690
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0
 Environment: $ java -version
java version "1.8.0_73"
Java(TM) SE Runtime Environment (build 1.8.0_73-b02)
Java HotSpot(TM) 64-Bit Server VM (build 25.73-b02, mixed mode)

$ uname -a
Linux spark-on-arm 4.2.0-55598-g45f70e3 #5 SMP Tue Feb 2 10:14:08 CET 2016 
aarch64 aarch64 aarch64 GNU/Linux
Reporter: Santiago M. Mola
Priority: Minor


UnsafeShuffleWriterSuite fails because of missing Snappy native library on 
arm64.


{code}
Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 6.437 sec <<< 
FAILURE! - in org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite
mergeSpillsWithFileStreamAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
  Time elapsed: 0.072 sec  <<< ERROR!
java.lang.reflect.InvocationTargetException
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
Caused by: java.lang.IllegalArgumentException: org.xerial.snappy.SnappyError: 
[FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux 
and os.arch=aarch64
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)
Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no 
native library is found for os.name=Linux and os.arch=aarch64
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy(UnsafeShuffleWriterSuite.java:389)

mergeSpillsWithTransferToAndSnappy(org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite)
  Time elapsed: 0.041 sec  <<< ERROR!
java.lang.reflect.InvocationTargetException
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
Caused by: java.lang.IllegalArgumentException: java.lang.NoClassDefFoundError: 
Could not initialize class org.xerial.snappy.Snappy
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class 
org.xerial.snappy.Snappy
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.testMergingSpills(UnsafeShuffleWriterSuite.java:337)
at 
org.apache.spark.shuffle.sort.UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy(UnsafeShuffleWriterSuite.java:384)

Running org.apache.spark.JavaAPISuite
Tests run: 90, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.526 sec - 
in org.apache.spark.JavaAPISuite
Running org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.761 sec - in 
org.apache.spark.unsafe.map.BytesToBytesMapOnHeapSuite
Running org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 8.967 sec - in 
org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite
Running org.apache.spark.api.java.OptionalSuite
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.003 sec - in 
org.apache.spark.api.java.OptionalSuite

Results :

Tests in error: 
  
UnsafeShuffleWriterSuite.mergeSpillsWithFileStreamAndSnappy:389->testMergingSpills:337
 » InvocationTarget
  
UnsafeShuffleWriterSuite.mergeSpillsWithTransferToAndSnappy:384->testMergingSpills:337
 » InvocationTarget
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2016-01-12 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093533#comment-15093533
 ] 

Santiago M. Mola commented on SPARK-12449:
--

Implementing this interface or an equivalent one would help standarize a lot of 
advanced features that data sources have been doing for some time. And while 
doing so, it would prevent them from creating their own SQLContext variants or 
patching the running SQLContext at runtime (using extraStrategies).

Here's a list of data source that are currently this approach. It would also be 
good to take them into account for this JIRA. The proposed interface and 
strategy should probably support all of these use cases. Some of them also use 
their own catalog implementation, but that should be something for a separate 
JIRA.

*spark-sql-on-hbase*

Already mentioned by [~yzhou2001]. They are using HBaseContext with 
extraStrategies that inject HBaseStrategies doing aggregation push down:
https://github.com/Huawei-Spark/Spark-SQL-on-HBase/blob/master/src/main/scala/org/apache/spark/sql/hbase/execution/HBaseStrategies.scala

*memsql-spark-connector*

They offer both their own SQLContext or inject their MemSQL-specific push down 
strategy on runtime.
They do match Catalyst's LogicalPlan in the same way we're proposing to push 
down filters, projects, aggregates, limits, sorts and joins:
https://github.com/memsql/memsql-spark-connector/blob/master/connectorLib/src/main/scala/com/memsql/spark/pushdown/MemSQLPushdownStrategy.scala

*spark-iqmulus*

Strategy injected to push down counts and some aggregates:

https://github.com/IGNF/spark-iqmulus/blob/master/src/main/scala/fr/ign/spark/iqmulus/ExtraStrategies.scala

*druid-olap*

They use SparkPlanner, Strategy and LogicalPlan APIs to do extensive push down. 
Their API usage could be limited to LogicalPlan only if this JIRA is 
implemented:

https://github.com/SparklineData/spark-druid-olap/blob/master/src/main/scala/org/apache/spark/sql/sources/druid/

*magellan* _(probably out of scope)_

Does its own BroadcastJoin. Although, it seems to me that this usage would be 
out of scope for us.

https://github.com/harsha2010/magellan/blob/master/src/main/scala/magellan/execution/MagellanStrategies.scala

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-23 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070100#comment-15070100
 ] 

Santiago M. Mola commented on SPARK-12449:
--

Well, at least with the implementation presented at the Spark Summit, only the 
logical plan is required. The physical plan is handled only by the planner 
strategy, which would be internal to Spark.

The strategy has all the logic required to split partial ops and push down only 
one part.

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-23 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070062#comment-15070062
 ] 

Santiago M. Mola commented on SPARK-12449:
--

The physical plan would not be consumed by data sources, only the logical plan. 

An alternative approach would be to use a different representation to pass the 
logical plan to the data source. If the relational algebra from Apache Calcite 
is stable enough, it could be used as the logical plan representation for this 
interface. 

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-12-22 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068396#comment-15068396
 ] 

Santiago M. Mola commented on SPARK-11855:
--

I will not have time to finish this before 1.6 release. Feel free to close the 
issue, since it won't apply after the release.

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

2015-12-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15066655#comment-15066655
 ] 

Santiago M. Mola commented on SPARK-12449:
--

At Stratio we are interested in this kind of interface too, both for SQL and 
NoSQL data sources (e.g. MongoDB).

> Pushing down arbitrary logical plans to data sources
> 
>
> Key: SPARK-12449
> URL: https://issues.apache.org/jira/browse/SPARK-12449
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Stephan Kessler
> Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources 
> for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows 
> to push down filters and projects pruning unnecessary fields and rows 
> directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more 
> preprocessing, e.g., evaluating aggregates. This is beneficial because it 
> would reduce the amount of data transferred from the source to Spark. The 
> existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to 
> defer the processing of arbitrary logical plans to the data source. We have 
> already shown the details at the Spark Summit 2015 Europe 
> [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15014193#comment-15014193
 ] 

Santiago M. Mola commented on SPARK-11855:
--

Thanks Michael. Sounds reasonable. I'll prepare a PR reducing the 
incompatibilities where it can be done in a non-invasive way. 

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Description: 
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

*Catalog* did get a lot of signatures changed too (because of TableIdentifier). 
Providing the older methods as deprecated also seems viable here.

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.

  was:
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.


> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> *Catalog* did get a lot of signatures changed too (because of 
> TableIdentifier). Providing the older methods as deprecated also seems viable 
> here.
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013914#comment-15013914
 ] 

Santiago M. Mola commented on SPARK-11855:
--

They have public visibility and no @DeveloperApi or @Experimental annotations, 
so I always assumed they are. I have been working with catalyst on a day-to-day 
basis for almost 1 year now. I understand that catalyst might not offer the 
same kind of backwards compatibility as spark-core, but it would be good to 
avoid breaking backwards compatibility, specially in cases where it is easy to 
do (which are most of the cases I encounter).

I think part of the solution is also marking some parts as @Experimental. For 
example, UnsafeArrayData interface changed wildly, and it's probably not viable 
to maintain backwards compatibility, but it should be marked as @Experimental 
if more breakage is expected before 2.0.

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
   Priority: Critical  (was: Major)
Description: 
There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
cases:

*UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with *UnresolvedStar*:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

Spark 1.5 already broke backwards compatibility of part of catalyst API with 
respect to 1.4. I understand there are good reasons for some cases, but we 
should try to minimize backwards compatibility breakages for 1.x. Specially now 
that 2.x is on the horizon and there will be a near opportunity to remove 
deprecated stuff.

  was:
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with UnresolvedStar:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}


> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> There's a number of APIs broken in catalyst 1.6.0. I'm trying to compile most 
> cases:
> *UnresolvedRelation*'s constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with *UnresolvedStar*:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}
> Spark 1.5 already broke backwards compatibility of part of catalyst API with 
> respect to 1.4. I understand there are good reasons for some cases, but we 
> should try to minimize backwards compatibility breakages for 1.x. Specially 
> now that 2.x is on the horizon and there will be a near opportunity to remove 
> deprecated stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) Catalyst breaks backwards compatibility in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Summary: Catalyst breaks backwards compatibility in branch-1.6  (was: 
UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in 
branch-1.6)

> Catalyst breaks backwards compatibility in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with UnresolvedStar:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Summary: UnresolvedRelation/UnresolvedStar constructors are not backwards 
compatible in branch-1.6  (was: UnresolvedRelation constructor is not backwards 
compatible in branch-1.6)

> UnresolvedRelation/UnresolvedStar constructors are not backwards compatible 
> in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11855) UnresolvedRelation/UnresolvedStar constructors are not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11855:
-
Description: 
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}

It is similar with UnresolvedStar:

{code}
-case class UnresolvedStar(table: Option[String]) extends Star with Unevaluable 
{
+case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
Unevaluable {
{code}

  was:
UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}


> UnresolvedRelation/UnresolvedStar constructors are not backwards compatible 
> in branch-1.6
> -
>
> Key: SPARK-11855
> URL: https://issues.apache.org/jira/browse/SPARK-11855
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Santiago M. Mola
>
> UnresolvedRelation's constructor has been changed from taking a Seq to a 
> TableIdentifier. A deprecated constructor taking Seq would be needed to be 
> backwards compatible.
> {code}
>  case class UnresolvedRelation(
> -tableIdentifier: Seq[String],
> +tableIdentifier: TableIdentifier,
>  alias: Option[String] = None) extends LeafNode {
> {code}
> It is similar with UnresolvedStar:
> {code}
> -case class UnresolvedStar(table: Option[String]) extends Star with 
> Unevaluable {
> +case class UnresolvedStar(target: Option[Seq[String]]) extends Star with 
> Unevaluable {
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11855) UnresolvedRelation constructor is not backwards compatible in branch-1.6

2015-11-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-11855:


 Summary: UnresolvedRelation constructor is not backwards 
compatible in branch-1.6
 Key: SPARK-11855
 URL: https://issues.apache.org/jira/browse/SPARK-11855
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Santiago M. Mola


UnresolvedRelation's constructor has been changed from taking a Seq to a 
TableIdentifier. A deprecated constructor taking Seq would be needed to be 
backwards compatible.

{code}
 case class UnresolvedRelation(
-tableIdentifier: Seq[String],
+tableIdentifier: TableIdentifier,
 alias: Option[String] = None) extends LeafNode {
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2015-11-17 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-11780:


 Summary: Provide type aliases in org.apache.spark.sql.types for 
backwards compatibility
 Key: SPARK-11780
 URL: https://issues.apache.org/jira/browse/SPARK-11780
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.0
Reporter: Santiago M. Mola


With SPARK-11273, ArrayData, MapData and others were moved from  
org.apache.spark.sql.types to org.apache.spark.sql.catalyst.util.

Since this is a backward incompatible change, it would be good to provide type 
aliases from the old package (deprecated) to the new one.

For example:
{code}
package object types {
   @deprecated
   type ArrayData = org.apache.spark.sql.catalyst.util.ArrayData
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11186) Caseness inconsistency between SQLContext and HiveContext

2015-10-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-11186:
-
Description: 
Default catalog behaviour for caseness is different in {{SQLContext}} and 
{{HiveContext}}.

{code}
  test("Catalog caseness (SQL)") {
val sqlc = new SQLContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }

  test("Catalog caseness (Hive)") {
val sqlc = new HiveContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }
{code}

Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
But the reason that this is needed seems undocumented (both in the manual or in 
the source code comments).

  was:
Default catalog behaviour for caseness is different in {{SQLContext}} and 
{{HiveContext}}.

{code}
  test("Catalog caseness (SQL)") {
val sqlc = new SQLContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }

  test("Catalog caseness (Hive)") {
val sqlc = new HiveContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }
{/code}

Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
But the reason that this is needed seems undocumented (both in the manual or in 
the source code comments).


> Caseness inconsistency between SQLContext and HiveContext
> -
>
> Key: SPARK-11186
> URL: https://issues.apache.org/jira/browse/SPARK-11186
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.1
>Reporter: Santiago M. Mola
>Priority: Minor
>
> Default catalog behaviour for caseness is different in {{SQLContext}} and 
> {{HiveContext}}.
> {code}
>   test("Catalog caseness (SQL)") {
> val sqlc = new SQLContext(sc)
> val relationName = "MyTable"
> sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
> BaseRelation {
>   override def sqlContext: SQLContext = sqlc
>   override def schema: StructType = StructType(Nil)
> }))
> val tables = sqlc.tableNames()
> assert(tables.contains(relationName))
>   }
>   test("Catalog caseness (Hive)") {
> val sqlc = new HiveContext(sc)
> val relationName = "MyTable"
> sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
> BaseRelation {
>   override def sqlContext: SQLContext = sqlc
>   override def schema: StructType = StructType(Nil)
> }))
> val tables = sqlc.tableNames()
> assert(tables.contains(relationName))
>   }
> {code}
> Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
> But the reason that this is needed seems undocumented (both in the manual or 
> in the source code comments).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11186) Caseness inconsistency between SQLContext and HiveContext

2015-10-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-11186:


 Summary: Caseness inconsistency between SQLContext and HiveContext
 Key: SPARK-11186
 URL: https://issues.apache.org/jira/browse/SPARK-11186
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.1
Reporter: Santiago M. Mola
Priority: Minor


Default catalog behaviour for caseness is different in {{SQLContext}} and 
{{HiveContext}}.

{code}
  test("Catalog caseness (SQL)") {
val sqlc = new SQLContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }

  test("Catalog caseness (Hive)") {
val sqlc = new HiveContext(sc)
val relationName = "MyTable"
sqlc.catalog.registerTable(relationName :: Nil, LogicalRelation(new 
BaseRelation {
  override def sqlContext: SQLContext = sqlc
  override def schema: StructType = StructType(Nil)
}))
val tables = sqlc.tableNames()
assert(tables.contains(relationName))
  }
{/code}

Looking at {{HiveContext#SQLSession}}, I see this is the intended behaviour. 
But the reason that this is needed seems undocumented (both in the manual or in 
the source code comments).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-10-01 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939660#comment-14939660
 ] 

Santiago M. Mola commented on SPARK-7275:
-

LogicalRelation was moved to execution.datasources in Spark 1.5, but it's still 
private[sql]. Can we make it public now?

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8377) Identifiers caseness information should be available at any time

2015-08-25 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14710855#comment-14710855
 ] 

Santiago M. Mola commented on SPARK-8377:
-

Right. However, there is no distinction between an identifier that was quoted 
by the user and one that was not. So the user intent is lost. If we see "a", we 
don't know if the user wanted strictly "a" or case insensitive "a". So if we 
have a column "a" and a column "A", which one should we match?

> Identifiers caseness information should be available at any time
> 
>
> Key: SPARK-8377
> URL: https://issues.apache.org/jira/browse/SPARK-8377
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>
> Currently, we have the option of having a case sensitive catalog or not. A 
> case insensitive catalog just lowercases all identifiers. However, when 
> pushing down to a data source, we lose the information about if an identifier 
> should be case insensitive or strictly lowercase.
> Ideally, we would be able to distinguish a case insensitive identifier from a 
> case sensitive one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9307) Logging: Make it either stable or private[spark]

2015-07-24 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-9307:
---

 Summary: Logging: Make it either stable or private[spark]
 Key: SPARK-9307
 URL: https://issues.apache.org/jira/browse/SPARK-9307
 Project: Spark
  Issue Type: Improvement
Reporter: Santiago M. Mola
Priority: Minor


org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{/code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9307) Logging: Make it either stable or private[spark]

2015-07-24 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-9307:

Description: 
org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].

  was:
org.apache.spark.Logging is a public class that is quite easy to include from 
any IDE, assuming it's safe to use because it's part of the public API.

However, its Javadoc states:

{code}
  NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
utility.
  This will likely be changed or removed in future releases.
{/code}

It would be safer to either make a commitment for the backwards-compatibility 
of this class, or make it private[spark].


> Logging: Make it either stable or private[spark]
> 
>
> Key: SPARK-9307
> URL: https://issues.apache.org/jira/browse/SPARK-9307
> Project: Spark
>  Issue Type: Improvement
>Reporter: Santiago M. Mola
>Priority: Minor
>
> org.apache.spark.Logging is a public class that is quite easy to include from 
> any IDE, assuming it's safe to use because it's part of the public API.
> However, its Javadoc states:
> {code}
>   NOTE: DO NOT USE this class outside of Spark. It is intended as an internal 
> utility.
>   This will likely be changed or removed in future releases.
> {code}
> It would be safer to either make a commitment for the backwards-compatibility 
> of this class, or make it private[spark].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6981) [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext

2015-07-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614645#comment-14614645
 ] 

Santiago M. Mola commented on SPARK-6981:
-

Any progress on this?

> [SQL] SparkPlanner and QueryExecution should be factored out from SQLContext
> 
>
> Key: SPARK-6981
> URL: https://issues.apache.org/jira/browse/SPARK-6981
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0, 1.4.0
>Reporter: Edoardo Vacchi
>Priority: Minor
>
> In order to simplify extensibility with new strategies from third-parties, it 
> should be better to factor SparkPlanner and QueryExecution in their own 
> classes. Dependent types add additional, unnecessary complexity; besides, 
> HiveContext would benefit from this change as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-07-06 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14614627#comment-14614627
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~davies] NULL values are grouped together when using a GROUP BY clause.

See 
https://en.wikipedia.org/wiki/Null_%28SQL%29#When_two_nulls_are_equal:_grouping.2C_sorting.2C_and_some_set_operations

{quote}
Because SQL:2003 defines all Null markers as being unequal to one another, a 
special definition was required in order to group Nulls together when 
performing certain operations. SQL defines "any two values that are equal to 
one another, or any two Nulls", as "not distinct". This definition of not 
distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other 
keywords that perform grouping) are used.
{quote}

> CaseKeyWhen has incorrect NULL handling
> ---
>
> Key: SPARK-8636
> URL: https://issues.apache.org/jira/browse/SPARK-8636
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Santiago M. Mola
>  Labels: starter
>
> CaseKeyWhen implementation in Spark uses the following equals implementation:
> {code}
>   private def equalNullSafe(l: Any, r: Any) = {
> if (l == null && r == null) {
>   true
> } else if (l == null || r == null) {
>   false
> } else {
>   l == r
> }
>   }
> {code}
> Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
> is not unequal either). In this case, a NULL value in a CASE WHEN expression 
> should never match.
> For example, you can execute this in MySQL:
> {code}
> SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END 
> FROM DUAL;
> {code}
> And the result will be "NULL DOES NOT MATCH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-29 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605557#comment-14605557
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~davies], [~animeshbaranawal] In SQL, NULL is never equal to NULL. Any 
comparison to NULL is UNKNOWN. Most SQL implementations represent UNKNOWN as 
NULL, too.

> CaseKeyWhen has incorrect NULL handling
> ---
>
> Key: SPARK-8636
> URL: https://issues.apache.org/jira/browse/SPARK-8636
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Santiago M. Mola
>  Labels: starter
>
> CaseKeyWhen implementation in Spark uses the following equals implementation:
> {code}
>   private def equalNullSafe(l: Any, r: Any) = {
> if (l == null && r == null) {
>   true
> } else if (l == null || r == null) {
>   false
> } else {
>   l == r
> }
>   }
> {code}
> Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
> is not unequal either). In this case, a NULL value in a CASE WHEN expression 
> should never match.
> For example, you can execute this in MySQL:
> {code}
> SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END 
> FROM DUAL;
> {code}
> And the result will be "NULL DOES NOT MATCH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6064) Checking data types when resolving types

2015-06-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602755#comment-14602755
 ] 

Santiago M. Mola commented on SPARK-6064:
-

This issue might have been superseded by 
https://issues.apache.org/jira/browse/SPARK-7562

> Checking data types when resolving types
> 
>
> Key: SPARK-6064
> URL: https://issues.apache.org/jira/browse/SPARK-6064
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Kai Zeng
>
> In catalyst/expressions/arithmetic.scala and 
> catalyst/expressions/predicates.scala, many arithmetic/predicate requires the 
> operands to be of certain numeric type. 
> These type checking codes should be done when we are resolving the 
> expressions.
> See this PR:
> https://github.com/apache/spark/pull/4685



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8654) Analysis exception when using "NULL IN (...)": invalid cast

2015-06-26 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8654:
---

 Summary: Analysis exception when using "NULL IN (...)": invalid 
cast
 Key: SPARK-8654
 URL: https://issues.apache.org/jira/browse/SPARK-8654
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


The following query throws an analysis exception:

{code}
SELECT * FROM t WHERE NULL NOT IN (1, 2, 3);
{code}

The exception is:

{code}
org.apache.spark.sql.AnalysisException: invalid cast from int to null;
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:42)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:66)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
{code}

Here is a test that can be added to AnalysisSuite to check the issue:

{code}
  test("SPARK- regression test") {
val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), 
"a")() :: Nil,
  LocalRelation()
)
caseInsensitiveAnalyze(plan)
  }
{code}

Note that this kind of query is a corner case, but it is still valid SQL. An 
expression such as "NULL IN (...)" or "NULL NOT IN (...)" always gives NULL as 
a result, even if the list contains NULL. So it is safe to translate these 
expressions to Literal(null) during analysis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602520#comment-14602520
 ] 

Santiago M. Mola commented on SPARK-8636:
-

[~animeshbaranawal] Yes, I think so.

> CaseKeyWhen has incorrect NULL handling
> ---
>
> Key: SPARK-8636
> URL: https://issues.apache.org/jira/browse/SPARK-8636
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
>Reporter: Santiago M. Mola
>  Labels: starter
>
> CaseKeyWhen implementation in Spark uses the following equals implementation:
> {code}
>   private def equalNullSafe(l: Any, r: Any) = {
> if (l == null && r == null) {
>   true
> } else if (l == null || r == null) {
>   false
> } else {
>   l == r
> }
>   }
> {code}
> Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
> is not unequal either). In this case, a NULL value in a CASE WHEN expression 
> should never match.
> For example, you can execute this in MySQL:
> {code}
> SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END 
> FROM DUAL;
> {code}
> And the result will be "NULL DOES NOT MATCH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8636) CaseKeyWhen has incorrect NULL handling

2015-06-25 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8636:
---

 Summary: CaseKeyWhen has incorrect NULL handling
 Key: SPARK-8636
 URL: https://issues.apache.org/jira/browse/SPARK-8636
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Santiago M. Mola


CaseKeyWhen implementation in Spark uses the following equals implementation:

{code}
  private def equalNullSafe(l: Any, r: Any) = {
if (l == null && r == null) {
  true
} else if (l == null || r == null) {
  false
} else {
  l == r
}
  }
{code}

Which is not correct, since in SQL, NULL is never equal to NULL (actually, it 
is not unequal either). In this case, a NULL value in a CASE WHEN expression 
should never match.

For example, you can execute this in MySQL:

{code}
SELECT CASE NULL WHEN NULL THEN "NULL MATCHES" ELSE "NULL DOES NOT MATCH" END 
FROM DUAL;
{code}

And the result will be "NULL DOES NOT MATCH".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-8628:

Description: 
SPARK-5009 introduced the following code in AbstractSparkSQLParser:

{code}
def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) => plan
  case failureOrError => sys.error(failureOrError.toString)
}
  }
{code}

The corresponding initialize method in SqlLexical is not thread-safe:

{code}
  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }
{code}

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.

  was:
SPARK-5009 introduced the following code:

def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) => plan
  case failureOrError => sys.error(failureOrError.toString)
}
  }

The corresponding initialize method in SqlLexical is not thread-safe:

  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.


> Race condition in AbstractSparkSQLParser.parse
> --
>
> Key: SPARK-8628
> URL: https://issues.apache.org/jira/browse/SPARK-8628
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0
>Reporter: Santiago M. Mola
>Priority: Critical
>  Labels: regression
>
> SPARK-5009 introduced the following code in AbstractSparkSQLParser:
> {code}
> def parse(input: String): LogicalPlan = {
> // Initialize the Keywords.
> lexical.initialize(reservedWords)
> phrase(start)(new lexical.Scanner(input)) match {
>   case Success(plan, _) => plan
>   case failureOrError => sys.error(failureOrError.toString)
> }
>   }
> {code}
> The corresponding initialize method in SqlLexical is not thread-safe:
> {code}
>   /* This is a work around to support the lazy setting */
>   def initialize(keywords: Seq[String]): Unit = {
> reserved.clear()
> reserved ++= keywords
>   }
> {code}
> I'm hitting this when parsing multiple SQL queries concurrently. When one 
> query parsing starts, it empties the reserved keyword list, then a 
> race-condition occurs and other queries fail to parse because they recognize 
> keywords as identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14601012#comment-14601012
 ] 

Santiago M. Mola commented on SPARK-8628:
-

Here is an example of failure with Spark 1.4.0:

{code}
[1.152] failure: ``union'' expected but identifier OR found

SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 
ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR a>b OR d>e

   ^
java.lang.RuntimeException: [1.152] failure: ``union'' expected but identifier 
OR found

SELECT CASE a+1 WHEN b THEN 111 WHEN c THEN 222 WHEN d THEN 333 WHEN e THEN 444 
ELSE 555 END, a-b, a FROM t1 WHERE e+d BETWEEN a+b-10 AND c+130 OR a>b OR d>e

   ^
at scala.sys.package$.error(package.scala:27)
{code}

> Race condition in AbstractSparkSQLParser.parse
> --
>
> Key: SPARK-8628
> URL: https://issues.apache.org/jira/browse/SPARK-8628
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0, 1.3.1, 1.4.0
>Reporter: Santiago M. Mola
>Priority: Critical
>  Labels: regression
>
> SPARK-5009 introduced the following code:
> def parse(input: String): LogicalPlan = {
> // Initialize the Keywords.
> lexical.initialize(reservedWords)
> phrase(start)(new lexical.Scanner(input)) match {
>   case Success(plan, _) => plan
>   case failureOrError => sys.error(failureOrError.toString)
> }
>   }
> The corresponding initialize method in SqlLexical is not thread-safe:
>   /* This is a work around to support the lazy setting */
>   def initialize(keywords: Seq[String]): Unit = {
> reserved.clear()
> reserved ++= keywords
>   }
> I'm hitting this when parsing multiple SQL queries concurrently. When one 
> query parsing starts, it empties the reserved keyword list, then a 
> race-condition occurs and other queries fail to parse because they recognize 
> keywords as identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8628) Race condition in AbstractSparkSQLParser.parse

2015-06-25 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8628:
---

 Summary: Race condition in AbstractSparkSQLParser.parse
 Key: SPARK-8628
 URL: https://issues.apache.org/jira/browse/SPARK-8628
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.3.1, 1.3.0
Reporter: Santiago M. Mola
Priority: Critical


SPARK-5009 introduced the following code:

def parse(input: String): LogicalPlan = {
// Initialize the Keywords.
lexical.initialize(reservedWords)
phrase(start)(new lexical.Scanner(input)) match {
  case Success(plan, _) => plan
  case failureOrError => sys.error(failureOrError.toString)
}
  }

The corresponding initialize method in SqlLexical is not thread-safe:

  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
reserved.clear()
reserved ++= keywords
  }

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6666) org.apache.spark.sql.jdbc.JDBCRDD does not escape/quote column names

2015-06-15 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585883#comment-14585883
 ] 

Santiago M. Mola commented on SPARK-:
-

I opened SPARK-8377 to track the general case, since I have this problem with 
other data sources, not just JDBC.

> org.apache.spark.sql.jdbc.JDBCRDD  does not escape/quote column names
> -
>
> Key: SPARK-
> URL: https://issues.apache.org/jira/browse/SPARK-
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
> Environment:  
>Reporter: John Ferguson
>Priority: Critical
>
> Is there a way to have JDBC DataFrames use quoted/escaped column names?  
> Right now, it looks like it "sees" the names correctly in the schema created 
> but does not escape them in the SQL it creates when they are not compliant:
> org.apache.spark.sql.jdbc.JDBCRDD
> 
> private val columnList: String = {
> val sb = new StringBuilder()
> columns.foreach(x => sb.append(",").append(x))
> if (sb.length == 0) "1" else sb.substring(1)
> }
> If you see value in this, I would take a shot at adding the quoting 
> (escaping) of column names here.  If you don't do it, some drivers... like 
> postgresql's will simply drop case all names when parsing the query.  As you 
> can see in the TL;DR below that means they won't match the schema I am given.
> TL;DR:
>  
> I am able to connect to a Postgres database in the shell (with driver 
> referenced):
>val jdbcDf = 
> sqlContext.jdbc("jdbc:postgresql://localhost/sparkdemo?user=dbuser", "sp500")
> In fact when I run:
>jdbcDf.registerTempTable("sp500")
>val avgEPSNamed = sqlContext.sql("SELECT AVG(`Earnings/Share`) as AvgCPI 
> FROM sp500")
> and
>val avgEPSProg = jsonDf.agg(avg(jsonDf.col("Earnings/Share")))
> The values come back as expected.  However, if I try:
>jdbcDf.show
> Or if I try
>
>val all = sqlContext.sql("SELECT * FROM sp500")
>all.show
> I get errors about column names not being found.  In fact the error includes 
> a mention of column names all lower cased.  For now I will change my schema 
> to be more restrictive.  Right now it is, per a Stack Overflow poster, not 
> ANSI compliant by doing things that are allowed by ""'s in pgsql, MySQL and 
> SQLServer.  BTW, our users are giving us tables like this... because various 
> tools they already use support non-compliant names.  In fact, this is mild 
> compared to what we've had to support.
> Currently the schema in question uses mixed case, quoted names with special 
> characters and spaces:
> CREATE TABLE sp500
> (
> "Symbol" text,
> "Name" text,
> "Sector" text,
> "Price" double precision,
> "Dividend Yield" double precision,
> "Price/Earnings" double precision,
> "Earnings/Share" double precision,
> "Book Value" double precision,
> "52 week low" double precision,
> "52 week high" double precision,
> "Market Cap" double precision,
> "EBITDA" double precision,
> "Price/Sales" double precision,
> "Price/Book" double precision,
> "SEC Filings" text
> ) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8377) Identifiers caseness information should be available at any time

2015-06-15 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8377:
---

 Summary: Identifiers caseness information should be available at 
any time
 Key: SPARK-8377
 URL: https://issues.apache.org/jira/browse/SPARK-8377
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola


Currently, we have the option of having a case sensitive catalog or not. A case 
insensitive catalog just lowercases all identifiers. However, when pushing down 
to a data source, we lose the information about if an identifier should be case 
insensitive or strictly lowercase.

Ideally, we would be able to distinguish a case insensitive identifier from a 
case sensitive one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8370) Add API for data sources to register databases

2015-06-15 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-8370:

Component/s: SQL

> Add API for data sources to register databases
> --
>
> Key: SPARK-8370
> URL: https://issues.apache.org/jira/browse/SPARK-8370
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Santiago M. Mola
>
> This API would allow to register a database with a data source instead of 
> just a table. Registering a data source database would register all its table 
> and maintain the catalog updated. The catalog could delegate to the data 
> source lookups of tables for a database registered with this API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-8370) Add API for data sources to register databases

2015-06-15 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-8370:
---

 Summary: Add API for data sources to register databases
 Key: SPARK-8370
 URL: https://issues.apache.org/jira/browse/SPARK-8370
 Project: Spark
  Issue Type: New Feature
Reporter: Santiago M. Mola


This API would allow to register a database with a data source instead of just 
a table. Registering a data source database would register all its table and 
maintain the catalog updated. The catalog could delegate to the data source 
lookups of tables for a database registered with this API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4867) UDF clean up

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559423#comment-14559423
 ] 

Santiago M. Mola commented on SPARK-4867:
-

Maybe this issue can be split in smaller tasks? A lot of built-in functions can 
be removed from the parser quite easily by registering them in the 
FunctionRegistry. I am doing this with a lot of fixed-arity functions.

I'm using some helper functions to create FunctionBuilders for Expression for 
use with the FunctionRegistry. The main helper looks like this:

{code}
  def expression[T <: Expression](arity: Int)(implicit tag: ClassTag[T]): 
ExpressionBuilder = {
val argTypes = (1 to arity).map(x => classOf[Expression])
val constructor = tag.runtimeClass.getDeclaredConstructor(argTypes: _*)
(expressions: Seq[Expression]) => {
  if (expressions.size != arity) {
throw new IllegalArgumentException(
  s"Invalid number of arguments: ${expressions.size} (must be equal to 
$arity)"
)
  }
  constructor.newInstance(expressions: _*).asInstanceOf[Expression]
}
  }
{code}

and can be used like this:

{code}
functionRegistry.registerFunction("MY_FUNCTION", expression[MyFunction])
{code}

If this approach looks like what is needed, I can extend it to use expressions 
with a variable number of parameters. Also, with some syntatic sugar we can 
provide a function that works this way:

{code}
functionRegistry.registerFunction[MyFunction]
// Register the builder produced by expression[MyFunction] with name 
"MY_FUNCTION" by using a camelcase -> underscore-separated conversion.
{code}

How does this sound?


> UDF clean up
> 
>
> Key: SPARK-4867
> URL: https://issues.apache.org/jira/browse/SPARK-4867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Michael Armbrust
>Priority: Blocker
>
> Right now our support and internal implementation of many functions has a few 
> issues.  Specifically:
>  - UDFS don't know their input types and thus don't do type coercion.
>  - We hard code a bunch of built in functions into the parser.  This is bad 
> because in SQL it creates new reserved words for things that aren't actually 
> keywords.  Also it means that for each function we need to add support to 
> both SQLContext and HiveContext separately.
> For this JIRA I propose we do the following:
>  - Change the interfaces for registerFunction and ScalaUdf to include types 
> for the input arguments as well as the output type.
>  - Add a rule to analysis that does type coercion for UDFs.
>  - Add a parse rule for functions to SQLParser.
>  - Rewrite all the UDFs that are currently hacked into the various parsers 
> using this new functionality.
> Depending on how big this refactoring becomes we could split parts 1&2 from 
> part 3 above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4678) A SQL query with subquery fails with TreeNodeException

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559335#comment-14559335
 ] 

Santiago M. Mola commented on SPARK-4678:
-

[~ozawa] Does this happen in more recent versions?

> A SQL query with subquery fails with TreeNodeException
> --
>
> Key: SPARK-4678
> URL: https://issues.apache.org/jira/browse/SPARK-4678
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.1
>Reporter: Tsuyoshi Ozawa
>
> {code}
> spark-sql> create external table if  NOT EXISTS randomText100GB(text string) 
> location 'hdfs:///user/ozawa/randomText100GB'; 
> spark-sql> CREATE TABLE wordcount AS
>  > SELECT word, count(1) AS count
>  > FROM (SELECT 
> EXPLODE(SPLIT(LCASE(REGEXP_REPLACE(text,'[\\p{Punct},\\p{Cntrl}]','')),' '))
>  > AS word FROM randomText100GB) words
>  > GROUP BY word;
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in 
> stage 1.0 failed 4 times, most recent failure: Lost task 9.3 in stage 1.0 
> (TID 25, hadoop-slave2.c.gcp-s
> amples.internal): 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding 
> attribute, tree: word#5
> 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:47)
> 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:43)
> 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:42)
> 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:165)
> 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:156)
> 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:42)
> 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
> 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection$$anonfun$$init$$2.apply(Projection.scala:52)
> 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
> 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
> scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
> scala.collection.AbstractTraversable.map(Traversable.scala:105)
> 
> org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.(Projection.scala:52)
> 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
> 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$newMutableProjection$1.apply(SparkPlan.scala:106)
> 
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:43)
> 
> org.apache.spark.sql.execution.Project$$anonfun$1.apply(basicOperators.scala:42)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:54)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3815) LPAD function does not work in where predicate

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559331#comment-14559331
 ] 

Santiago M. Mola commented on SPARK-3815:
-

[~yanakad] Is this still present in more recent versions? If yes, could you 
provide a minimal test case (query + data)?

> LPAD function does not work in where predicate
> --
>
> Key: SPARK-3815
> URL: https://issues.apache.org/jira/browse/SPARK-3815
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Yana Kadiyska
>Priority: Minor
>
> select customer_id from mytable where 
> pkey=concat_ws('-',LPAD('077',4,'0'),'2014-07') LIMIT 2
> produces:
> 14/10/03 14:51:35 ERROR server.SparkSQLOperationManager: Error executing 
> query:
> org.apache.spark.SparkException: Task not serializable
> at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
> at 
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
> at org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:597)
> at 
> org.apache.spark.sql.execution.Limit.execute(basicOperators.scala:146)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:360)
> at 
> org.apache.spark.sql.hive.thriftserver.server.SparkSQLOperationManager$$anon$1.run(SparkSQLOperationManager.scala:185)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:193)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatement(HiveSessionImpl.java:175)
> at 
> org.apache.hive.service.cli.CLIService.executeStatement(CLIService.java:150)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:207)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1133)
> at 
> org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1118)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:58)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor$1.run(TUGIContainingProcessor.java:55)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:526)
> at 
> org.apache.hive.service.auth.TUGIContainingProcessor.process(TUGIContainingProcessor.java:55)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.NotSerializableException: java.lang.reflect.Constructor
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1377)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1173)
> at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
> at 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
> at 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
> at scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
> at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl

[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14559097#comment-14559097
 ] 

Santiago M. Mola commented on SPARK-7012:
-

[~6133d] SQLContext parses DDL statements (such as CREATE TEMPORARY TABLE) with 
an independent parser called DDLParser:
https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L87

The parsing of the columns for the schema is done in DDLParser.column:
https://github.com/apache/spark/blob/f38e619c41d242143c916373f2a44ec674679f19/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176

> Add support for NOT NULL modifier for column definitions on DDLParser
> -
>
> Key: SPARK-7012
> URL: https://issues.apache.org/jira/browse/SPARK-7012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: easyfix
>
> Add support for NOT NULL modifier for column definitions on DDLParser. This 
> would add support for the following syntax:
> CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo) on joins when enabling codegen

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Summary: [SQL] Serialization exception (Kryo) on joins when enabling 
codegen   (was: [SQL] Serialization exception (Kryo and Java) on joins when 
enabling codegen )

> [SQL] Serialization exception (Kryo) on joins when enabling codegen 
> 
>
> Key: SPARK-3846
> URL: https://issues.apache.org/jira/browse/SPARK-3846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Jianshi Huang
>Priority: Blocker
>
> The error is reproducible when I join two tables manually. The error message 
> is like follows.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
> in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
> 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
> Unable to find class: 
> __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
> 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5707) Enabling spark.sql.codegen throws ClassNotFound exception

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558823#comment-14558823
 ] 

Santiago M. Mola commented on SPARK-5707:
-

This is probably a duplicate of SPARK-3846.

> Enabling spark.sql.codegen throws ClassNotFound exception
> -
>
> Key: SPARK-5707
> URL: https://issues.apache.org/jira/browse/SPARK-5707
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0, 1.3.1
> Environment: yarn-client mode, spark.sql.codegen=true
>Reporter: Yi Yao
>Assignee: Ram Sriharsha
>Priority: Blocker
>
> Exception thrown:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in 
> stage 133.0 failed 4 times, most recent failure: Lost task 13.3 in stage 
> 133.0 (TID 3066, cdh52-node2): java.io.IOException: 
> com.esotericsoftware.kryo.KryoException: Unable to find class: 
> __wrapper$1$81257352e1c844aebf09cb84fe9e7459.__wrapper$1$81257352e1c844aebf09cb84fe9e7459$SpecificRow$1
> Serialization trace:
> hashTable (org.apache.spark.sql.execution.joins.UniqueKeyHashedRelation)
> at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1011)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
> at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
> at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:62)
> at 
> org.apache.spark.sql.execution.joins.BroadcastHashJoin$$anonfun$3.apply(BroadcastHashJoin.scala:61)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
> at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:601)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.rdd.CartesianRDD.compute(CartesianRDD.scala:75)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
> at 
> java.util.concurrent.ThreadPoolExecutor.r

[jira] [Updated] (SPARK-3846) [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Summary: [SQL] Serialization exception (Kryo and Java) on joins when 
enabling codegen   (was: KryoException when doing joins in SparkSQL )

> [SQL] Serialization exception (Kryo and Java) on joins when enabling codegen 
> -
>
> Key: SPARK-3846
> URL: https://issues.apache.org/jira/browse/SPARK-3846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Jianshi Huang
>Priority: Blocker
>
> The error is reproducible when I join two tables manually. The error message 
> is like follows.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
> in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
> 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
> Unable to find class: 
> __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
> 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3846) KryoException when doing joins in SparkSQL

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-3846:

Priority: Blocker  (was: Major)

> KryoException when doing joins in SparkSQL 
> ---
>
> Key: SPARK-3846
> URL: https://issues.apache.org/jira/browse/SPARK-3846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Jianshi Huang
>Priority: Blocker
>
> The error is reproducible when I join two tables manually. The error message 
> is like follows.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
> in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
> 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
> Unable to find class: 
> __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
> 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3846) KryoException when doing joins in SparkSQL

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558808#comment-14558808
 ] 

Santiago M. Mola commented on SPARK-3846:
-

[~huangjs]  Would you mind adding a test case here (an example of data and 
exact code used to produce the exception)?

> KryoException when doing joins in SparkSQL 
> ---
>
> Key: SPARK-3846
> URL: https://issues.apache.org/jira/browse/SPARK-3846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Jianshi Huang
>
> The error is reproducible when I join two tables manually. The error message 
> is like follows.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 645 
> in stage 3.0 failed 4 times, most recent failure: Lost task 645.3 in stage 
> 3.0 (TID 3802, ...): com.esotericsoftware.kryo.KryoException:
> Unable to find class: 
> __wrapper$1$18e31777385a452ba0bc030e899bf5d1.__wrapper$1$18e31777385a452ba0bc030e899bf5d1$SpecificRow$1
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
> 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
> com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:610)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:721)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
> com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:34)
> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
> 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:133)
> 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
> org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
> 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
> 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.HashJoin$$anon$1.hasNext(joins.scala:101)
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:198)
> 
> org.apache.spark.sql.execution.GeneratedAggregate$$anonfun$8.apply(GeneratedAggregate.scala:165)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:599)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
> 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
> org.apache.spark.scheduler.Task.run(Task.scala:56)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-7823) [SQL] Batch, FixedPoint, Strategy should not be inner classes of class RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-7823.
-
Resolution: Duplicate

This is a duplicate of https://issues.apache.org/jira/browse/SPARK-7727

> [SQL] Batch, FixedPoint, Strategy should not be inner classes of class 
> RuleExecutor
> ---
>
> Key: SPARK-7823
> URL: https://issues.apache.org/jira/browse/SPARK-7823
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1, 1.4.0
>Reporter: Edoardo Vacchi
>Priority: Minor
>
> Batch, FixedPoint, Strategy, Once, are defined within the class 
> RuleExecutor[TreeType]. This makes unnecessarily complicated to reuse batches 
> of rules within custom optimizers. E.g:
> {code:java}
> object DefaultOptimizer extends Optimizer {
>   override val batches = /* batches defined here */
> }
> object MyCustomOptimizer extends Optimizer {
>   override val batches = 
> Batch("my custom batch" ...) ::
> DefaultOptimizer.batches
> }
> {code}
> MyCustomOptimizer won't compile, because DefaultOptimizer.batches has type 
> "Seq[DefaultOptimizer.this.Batch]". 
> Solution: Batch, FixedPoint, etc. should be moved *outside* the 
> RuleExecutor[T] class body, either in a companion object or right in the 
> `rules` package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7727:

Comment: was deleted

(was: [~evacchi] I'm sorry I opened this duplicate for: 
https://issues.apache.org/jira/browse/SPARK-7823

Not sure which one to mark as duplicate since both have pull requests.)

> Avoid inner classes in RuleExecutor
> ---
>
> Key: SPARK-7727
> URL: https://issues.apache.org/jira/browse/SPARK-7727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>  Labels: easyfix, starter
>
> In RuleExecutor, the following classes and objects are defined as inner 
> classes or objects: Strategy, Once, FixedPoint, Batch.
> This does not seem to accomplish anything in this case, but makes 
> extensibility harder. For example, if I want to define a new Optimizer that 
> uses all batches from the DefaultOptimizer plus some more, I would do 
> something like:
> {code}
> new Optimizer {
> override protected val batches: Seq[Batch] =
>   DefaultOptimizer.batches ++ myBatches
>  }
> {code}
> But this will give a typing error because batches in DefaultOptimizer are of 
> type DefaultOptimizer#Batch while myBatches are this#Batch.
> Workarounds include either copying the list of batches from DefaultOptimizer 
> or using a method like this:
> {code}
> private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
>   val strategy = b.strategy.maxIterations match {
> case 1 => Once
> case n => FixedPoint(n)
>   }
>   Batch(b.name, strategy, b.rules)
> }
> {code}
> However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558777#comment-14558777
 ] 

Santiago M. Mola commented on SPARK-7727:
-

[~evacchi] I'm sorry I opened this duplicate for: 
https://issues.apache.org/jira/browse/SPARK-7823

Not sure which one to mark as duplicate since both have pull requests.

> Avoid inner classes in RuleExecutor
> ---
>
> Key: SPARK-7727
> URL: https://issues.apache.org/jira/browse/SPARK-7727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>  Labels: easyfix, starter
>
> In RuleExecutor, the following classes and objects are defined as inner 
> classes or objects: Strategy, Once, FixedPoint, Batch.
> This does not seem to accomplish anything in this case, but makes 
> extensibility harder. For example, if I want to define a new Optimizer that 
> uses all batches from the DefaultOptimizer plus some more, I would do 
> something like:
> {code}
> new Optimizer {
> override protected val batches: Seq[Batch] =
>   DefaultOptimizer.batches ++ myBatches
>  }
> {code}
> But this will give a typing error because batches in DefaultOptimizer are of 
> type DefaultOptimizer#Batch while myBatches are this#Batch.
> Workarounds include either copying the list of batches from DefaultOptimizer 
> or using a method like this:
> {code}
> private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
>   val strategy = b.strategy.maxIterations match {
> case 1 => Once
> case n => FixedPoint(n)
>   }
>   Batch(b.name, strategy, b.rules)
> }
> {code}
> However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-26 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14558758#comment-14558758
 ] 

Santiago M. Mola commented on SPARK-7727:
-

[~chenghao] I think that is a good idea. Analyzer could be converted into a 
trait, moving current Analyzer to DefaultAnalyzer. It is probably a good idea 
to use a separate JIRA and pull request for that though.

> Avoid inner classes in RuleExecutor
> ---
>
> Key: SPARK-7727
> URL: https://issues.apache.org/jira/browse/SPARK-7727
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>  Labels: easyfix, starter
>
> In RuleExecutor, the following classes and objects are defined as inner 
> classes or objects: Strategy, Once, FixedPoint, Batch.
> This does not seem to accomplish anything in this case, but makes 
> extensibility harder. For example, if I want to define a new Optimizer that 
> uses all batches from the DefaultOptimizer plus some more, I would do 
> something like:
> {code}
> new Optimizer {
> override protected val batches: Seq[Batch] =
>   DefaultOptimizer.batches ++ myBatches
>  }
> {code}
> But this will give a typing error because batches in DefaultOptimizer are of 
> type DefaultOptimizer#Batch while myBatches are this#Batch.
> Workarounds include either copying the list of batches from DefaultOptimizer 
> or using a method like this:
> {code}
> private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
>   val strategy = b.strategy.maxIterations match {
> case 1 => Once
> case n => FixedPoint(n)
>   }
>   Batch(b.name, strategy, b.rules)
> }
> {code}
> However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5755) remove unnecessary Add for unary plus sign

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-5755.
-
   Resolution: Fixed
Fix Version/s: 1.3.0

> remove unnecessary Add for unary plus sign 
> ---
>
> Key: SPARK-5755
> URL: https://issues.apache.org/jira/browse/SPARK-5755
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Adrian Wang
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5755) remove unnecessary Add for unary plus sign (HiveQL)

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-5755:

Summary: remove unnecessary Add for unary plus sign (HiveQL)  (was: remove 
unnecessary Add for unary plus sign )

> remove unnecessary Add for unary plus sign (HiveQL)
> ---
>
> Key: SPARK-5755
> URL: https://issues.apache.org/jira/browse/SPARK-5755
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Adrian Wang
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5305) Using a field in a WHERE clause that is not in the schema does not throw an exception.

2015-05-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555070#comment-14555070
 ] 

Santiago M. Mola commented on SPARK-5305:
-

[~sonixbp] What version were you using? Do you still experience this problem? 
It does not seem possible with recent versions.

> Using a field in a WHERE clause that is not in the schema does not throw an 
> exception.
> --
>
> Key: SPARK-5305
> URL: https://issues.apache.org/jira/browse/SPARK-5305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Corey J. Nolet
>
> Given a schema:
> key1 = String
> key2 = Integer
> The following sql statement doesn't seem to throw an exception:
> SELECT * FROM myTable WHERE doesntExist = 'val1'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7754) [SQL] Use PartialFunction literals instead of objects in Catalyst

2015-05-21 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14554935#comment-14554935
 ] 

Santiago M. Mola commented on SPARK-7754:
-

Not all rules use transform. Some use transformUp and others use 
transformAllExpressions. Maybe this rule API could be extended to cover these 
cases.

> [SQL] Use PartialFunction literals instead of objects in Catalyst
> -
>
> Key: SPARK-7754
> URL: https://issues.apache.org/jira/browse/SPARK-7754
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Edoardo Vacchi
>Priority: Minor
>
> Catalyst rules extend two distinct "rule" types: {{Rule[LogicalPlan]}} and 
> {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
> The distinction is fairly subtle: in the end, both rule types are supposed to 
> define a method {{apply(plan: LogicalPlan)}}
> (where LogicalPlan is either Logical- or Spark-) which returns a transformed 
> plan (or a sequence thereof, in the case
> of Strategy).
> Ceremonies asides, the body of such method is always of the kind:
> {code:java}
>  def apply(plan: PlanType) = plan match pf
> {code}
> where `pf` would be some `PartialFunction` of the PlanType:
> {code:java}
>   val pf = {
> case ... => ...
>   }
> {code}
> This is JIRA is a proposal to introduce utility methods to
>   a) reduce the boilerplate to define rewrite rules
>   b) turning them back into what they essentially represent: function types.
> These changes would be backwards compatible, and would greatly help in 
> understanding what the code does. Current use of objects is redundant and 
> possibly confusing.
> *{{Rule[LogicalPlan]}}*
> a) Introduce the utility object
> {code:java}
>   object rule 
> def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
> Rule[LogicalPlan] =
>   new Rule[LogicalPlan] {
> def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>   }
> def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
> Rule[LogicalPlan] =
>   new Rule[LogicalPlan] {
> override val ruleName = name
> def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
>   }
> {code}
> b) progressively replace the boilerplate-y object definitions; e.g.
> {code:java}
> object MyRewriteRule extends Rule[LogicalPlan] {
>   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
> case ... => ...
> }
> {code}
> with
> {code:java}
> // define a Rule[LogicalPlan]
> val MyRewriteRule = rule {
>   case ... => ...
> }
> {code}
> and/or :
> {code:java}
> // define a named Rule[LogicalPlan]
> val MyRewriteRule = rule.named("My rewrite rule") {
>   case ... => ...
> }
> {code}
> *Strategies*
> A similar solution could be applied to shorten the code for
> Strategies, which are total functions
> only because they are all supposed to manage the default case,
> possibly returning `Nil`. In this case
> we might introduce the following utility:
> {code:java}
> object strategy {
>   /**
>* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
>* The partial function must therefore return *one single* SparkPlan for 
> each case.
>* The method will automatically wrap them in a [[Seq]].
>* Unhandled cases will automatically return Seq.empty
>*/
>   def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
> new Strategy {
>   def apply(plan: LogicalPlan): Seq[SparkPlan] =
> if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
> }
>   /**
>* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] 
> ].
>* The partial function must therefore return a Seq[SparkPlan] for each 
> case.
>* Unhandled cases will automatically return Seq.empty
>*/
>  def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
> new Strategy {
>   def apply(plan: LogicalPlan): Seq[SparkPlan] =
> if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
> }
> }
> {code}
> Usage:
> {code:java}
> val mystrategy = strategy { case ... => ... }
> val seqstrategy = strategy.seq { case ... => ... }
> {code}
> *Further possible improvements:*
> Making the utility methods `implicit`, thereby
> further reducing the rewrite rules to:
> {code:java}
> // define a PartialFunction[LogicalPlan, LogicalPlan]
> // the implicit would convert it into a Rule[LogicalPlan] at the use sites
> val MyRewriteRule = {
>   case ... => ...
> }
> {code}
> *Caveats*
> Because of the way objects are initialized vs. vals, it might be necessary
> reorder instructions so that vals are actually initialized before they are 
> used.
> E.g.:
> {code:java}
> class MyOptim

[jira] [Reopened] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-21 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola reopened SPARK-7724:
-

Thanks. Here's a PR.

> Add support for Intersect and Except in Catalyst DSL
> 
>
> Key: SPARK-7724
> URL: https://issues.apache.org/jira/browse/SPARK-7724
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Trivial
>  Labels: easyfix, starter
>
> Catalyst DSL to create logical plans supports most of the current plan, but 
> it is missing Except and Intersect. See LogicalPlanFunctions:
> https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-20 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553059#comment-14553059
 ] 

Santiago M. Mola edited comment on SPARK-7724 at 5/20/15 8:36 PM:
--

DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except classes directly 
though.


was (Author: smolav):
DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except clases directly 
though.

> Add support for Intersect and Except in Catalyst DSL
> 
>
> Key: SPARK-7724
> URL: https://issues.apache.org/jira/browse/SPARK-7724
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Trivial
>  Labels: easyfix, starter
>
> Catalyst DSL to create logical plans supports most of the current plan, but 
> it is missing Except and Intersect. See LogicalPlanFunctions:
> https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-20 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14553059#comment-14553059
 ] 

Santiago M. Mola commented on SPARK-7724:
-

DataFrame is beyond the scope here. I do use the catalyst DSL quite intensively 
for writing test cases, so I thought that a trivial patch to complete the API 
would make sense. I can continue using Intersect and Except clases directly 
though.

> Add support for Intersect and Except in Catalyst DSL
> 
>
> Key: SPARK-7724
> URL: https://issues.apache.org/jira/browse/SPARK-7724
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Trivial
>  Labels: easyfix, starter
>
> Catalyst DSL to create logical plans supports most of the current plan, but 
> it is missing Except and Intersect. See LogicalPlanFunctions:
> https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7727) Avoid inner classes in RuleExecutor

2015-05-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7727:
---

 Summary: Avoid inner classes in RuleExecutor
 Key: SPARK-7727
 URL: https://issues.apache.org/jira/browse/SPARK-7727
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola


In RuleExecutor, the following classes and objects are defined as inner classes 
or objects: Strategy, Once, FixedPoint, Batch.

This does not seem to accomplish anything in this case, but makes extensibility 
harder. For example, if I want to define a new Optimizer that uses all batches 
from the DefaultOptimizer plus some more, I would do something like:

{code}
new Optimizer {
override protected val batches: Seq[Batch] =
  DefaultOptimizer.batches ++ myBatches
 }
{code}

But this will give a typing error because batches in DefaultOptimizer are of 
type DefaultOptimizer#Batch while myBatches are this#Batch.

Workarounds include either copying the list of batches from DefaultOptimizer or 
using a method like this:

{code}
private def transformBatchType(b: DefaultOptimizer.Batch): Batch = {
  val strategy = b.strategy.maxIterations match {
case 1 => Once
case n => FixedPoint(n)
  }
  Batch(b.name, strategy, b.rules)
}
{code}

However, making these classes outer would solve the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7724) Add support for Intersect and Except in Catalyst DSL

2015-05-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7724:
---

 Summary: Add support for Intersect and Except in Catalyst DSL
 Key: SPARK-7724
 URL: https://issues.apache.org/jira/browse/SPARK-7724
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Trivial


Catalyst DSL to create logical plans supports most of the current plan, but it 
is missing Except and Intersect. See LogicalPlanFunctions:

https://github.com/apache/spark/blob/6008ec14ed6491d0a854bb50548c46f2f9709269/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala#L248



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-05-17 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547362#comment-14547362
 ] 

Santiago M. Mola commented on SPARK-7275:
-

[~rxin] What are your thoughts on this?

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543602#comment-14543602
 ] 

Santiago M. Mola commented on SPARK-6743:
-

This problem only happens for cached relations. Here is the root of the problem:

{code}
/* Fails. Got: Array(Row("A1"), Row("A2") */
assertResult(Array(Row(), Row()))(
  InMemoryColumnarTableScan(Nil, Nil, 
sqlc.table("tab0").queryExecution.sparkPlan.asInstanceOf[InMemoryColumnarTableScan].relation)
.execute().collect()
)
{code}

InMemoryColumnarTableScan returns the narrowest column when no attributes are 
requested:

{code}
 // Find the ordinals and data types of the requested columns.  If none are 
requested, use the
 // narrowest (the field with minimum default element size).
  val (requestedColumnIndices, requestedColumnDataTypes) = if 
(attributes.isEmpty) {
val (narrowestOrdinal, narrowestDataType) =
  relation.output.zipWithIndex.map { case (a, ordinal) =>
ordinal -> a.dataType
  } minBy { case (_, dataType) =>
ColumnType(dataType).defaultSize
  }
Seq(narrowestOrdinal) -> Seq(narrowestDataType)
  } else {
attributes.map { a =>
  relation.output.indexWhere(_.exprId == a.exprId) -> a.dataType
}.unzip
  }
{code}

It seems this is what leads to incorrect results.

> Join with empty projection on one side produces invalid results
> ---
>
> Key: SPARK-6743
> URL: https://issues.apache.org/jira/browse/SPARK-6743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>   (83,0,38),
>   (26,0,79),
>   (43,81,24)
> ))
> sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543466#comment-14543466
 ] 

Santiago M. Mola commented on SPARK-6743:
-

Note that the bug is not related to GROUP BY, that's just a quick way to 
produce a Project logical plan with an empty projection list from SQL. Builing 
upon my previous test case, here are some further instances of the bug using 
logical plans and DataFrames:

{code}
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.dsl.expressions._

val plan0 = sqlc.table("tab0").logicalPlan.subquery('tab0)
val plan1 = sqlc.table("tab1").logicalPlan.subquery('tab1)

/* Succeeds */
val planA = plan0.select('_1 as "c0")
  .join(plan1.select('_1 as "c1"))
  .select('c0, 'c1)
  .orderBy('c0.asc, 'c1.asc)
assertResult(Array(Row("A1", "B1"), Row("A1", "B2"), Row("A2", "B1"), 
Row("A2", "B2")))(DataFrame(sqlc, planA).collect())

/* Fails. Got: Array([A1], [A1], [A2], [A2]) */
val planB = plan0.select('_1 as "c0")
  .join(plan1.select('_1 as "c1"))
  .select('c1)
  .orderBy('c1.asc)
assertResult(Array(Row("B1"), Row("B1"), Row("B2"), 
Row("B2")))(DataFrame(sqlc, planB).collect())

/* Fails. Got: Array([A1], [A1], [A2], [A2]) */
val planC = plan0.select()
  .join(plan1.select('_1 as "c1"))
  .select('c1)
  .orderBy('c1.asc)
assertResult(Array(Row("B1"), Row("B1"), Row("B2"), 
Row("B2")))(DataFrame(sqlc, planC).collect())
{code}

> Join with empty projection on one side produces invalid results
> ---
>
> Key: SPARK-6743
> URL: https://issues.apache.org/jira/browse/SPARK-6743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>   (83,0,38),
>   (26,0,79),
>   (43,81,24)
> ))
> sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543435#comment-14543435
 ] 

Santiago M. Mola commented on SPARK-6743:
-

Sorry, my first example was not very clear. Here is a more precise one:

{code}
 val sqlc = new SQLContext(sc)

val tab0 = sc.parallelize(Seq(
  Tuple1("A1"),
  Tuple1("A2")
))
sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab0), "tab0")
sqlc.cacheTable("tab0")

val tab1 = sc.parallelize(Seq(
  Tuple1("B1"),
  Tuple1("B2")
))
sqlc.registerDataFrameAsTable(sqlc.createDataFrame(tab1), "tab1")
sqlc.cacheTable("tab1")

/* Succeeds */
val result1 = sqlc.sql("SELECT tab0._1,tab1._1 FROM tab0, tab1 GROUP BY 
tab0._1,tab1._1 ORDER BY tab0._1, tab1._1").collect()
assertResult(Array(Row("A1", "B1"), Row("A1", "B2"), Row("A2", "B1"), 
Row("A2", "B2")))(result1)

/* Fails. Got: Array([A1], [A2]) */
val result2 = sqlc.sql("SELECT tab1._1 FROM tab0, tab1 GROUP BY tab1._1 
ORDER BY tab1._1").collect()
assertResult(Array(Row("B1"), Row("B2")))(result2)
{code}

> Join with empty projection on one side produces invalid results
> ---
>
> Key: SPARK-6743
> URL: https://issues.apache.org/jira/browse/SPARK-6743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>   (83,0,38),
>   (26,0,79),
>   (43,81,24)
> ))
> sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-05-14 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543403#comment-14543403
 ] 

Santiago M. Mola commented on SPARK-7012:
-

In Spark SQL, every expression can be nullable or not (i.e. values can be null 
or not). All Spark SQL and Catalyst internals support specifying this.

See, for example, StructField, which is the relevant class for schemas:
https://github.com/apache/spark/blob/2d6612cc8b98f767d73c4d15e4065bf3d6c12ea7/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructField.scala#L31

Or AttributeReference:
https://github.com/apache/spark/blob/c1080b6fddb22d84694da2453e46a03fbc041576/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala#L166

However, when creating a temporary table through a SQL statement (CREATE 
TEMPORARY TABLE), there is no way of specifying if a column is nullable or not 
(it will be always nullable by default).

Standard SQL supports a constraint called "NOT NULL" to specify that a column 
is not nullable. See:
http://www.w3schools.com/sql/sql_notnull.asp

In order to implement this, the parser for "CREATE TEMPORARY TABLE", that is, 
DDLParser, should be modifyed to allow "NOT NULL" and set nullable = false 
accordingly in StructField. See:
https://github.com/apache/spark/blob/0595b6de8f1da04baceda082553c2aa1aa2cb006/sql/core/src/main/scala/org/apache/spark/sql/sources/ddl.scala#L176


> Add support for NOT NULL modifier for column definitions on DDLParser
> -
>
> Key: SPARK-7012
> URL: https://issues.apache.org/jira/browse/SPARK-7012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Minor
>  Labels: easyfix
>
> Add support for NOT NULL modifier for column definitions on DDLParser. This 
> would add support for the following syntax:
> CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4758) Make metastore_db in-memory for HiveContext

2015-05-13 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541865#comment-14541865
 ] 

Santiago M. Mola commented on SPARK-4758:
-

This could also make testing more convenient. Is there any progress on this?

> Make metastore_db in-memory for HiveContext
> ---
>
> Key: SPARK-4758
> URL: https://issues.apache.org/jira/browse/SPARK-4758
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Jianshi Huang
>Priority: Minor
>
> HiveContext by default will create a local folder metastore_db.
> This is not very user friendly as the metastore_db will be locked by 
> HiveContext and thus will block multiple Spark process to start from the same 
> directory.
> I would propose adding a default hive-site.xml in conf/ with the following 
> content.
> 
>   
> javax.jdo.option.ConnectionURL
> jdbc:derby:memory:databaseName=metastore_db;create=true
>   
>   
> javax.jdo.option.ConnectionDriverName
> org.apache.derby.jdbc.EmbeddedDriver
>   
>   
> hive.metastore.warehouse.dir
> file://${user.dir}/hive/warehouse
>   
> 
> jdbc:derby:memory:databaseName=metastore_db;create=true Will make sure the 
> embedded derby database is created in-memory.
> Jianshi



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7566) HiveContext.analyzer cannot be overriden

2015-05-12 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7566:
---

 Summary: HiveContext.analyzer cannot be overriden
 Key: SPARK-7566
 URL: https://issues.apache.org/jira/browse/SPARK-7566
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola


Trying to override HiveContext.analyzer will give the following compilation 
error:

{code}
Error:(51, 36) overriding lazy value analyzer in class HiveContext of type 
org.apache.spark.sql.catalyst.analysis.Analyzer{val extendedResolutionRules: 
List[org.apache.spark.sql.catalyst.rules.Rule[org.apache.spark.sql.catalyst.plans.logical.LogicalPlan]]};
 lazy value analyzer has incompatible type
  override protected[sql] lazy val analyzer: Analyzer = {
   ^
{code}

That is because the type changed inadvertedly when omitting the type 
declaration of the return type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-05-11 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538737#comment-14538737
 ] 

Santiago M. Mola commented on SPARK-6743:
-

Any thoughts on this?

> Join with empty projection on one side produces invalid results
> ---
>
> Key: SPARK-6743
> URL: https://issues.apache.org/jira/browse/SPARK-6743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>   (83,0,38),
>   (26,0,79),
>   (43,81,24)
> ))
> sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-05-11 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538734#comment-14538734
 ] 

Santiago M. Mola commented on SPARK-7088:
-

Any thoughts on this?

> [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
> -
>
> Key: SPARK-7088
> URL: https://issues.apache.org/jira/browse/SPARK-7088
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Critical
>  Labels: regression
>
> We're using some custom logical plans. We are now migrating from Spark 1.3.0 
> to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
> internal code, so we understand that. But now the ResolveReferences rule, 
> that used to work with third-party logical plans just does not work, without 
> any possible workaround that I'm aware other than just copying 
> ResolveReferences rule and using it with our own fix.
> The change in question is this section of code:
> {code}
> }.headOption.getOrElse { // Only handle first case, others will be 
> fixed on the next pass.
>   sys.error(
> s"""
>   |Failure when resolving conflicting references in Join:
>   |$plan
>   |
>   |Conflicting attributes: ${conflictingAttributes.mkString(",")}
>   """.stripMargin)
> }
> {code}
> Which causes the following error on analysis:
> {code}
> Failure when resolving conflicting references in Join:
> 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
> c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
>  'Join Inner, None
>   Subquery l
>Subquery h
> Project [name#12,node#36]
>  CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
>   Subquery v
>Subquery h_src
> LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
> mapPartitions at ExistingRDD.scala:37
>   Subquery r
>Subquery h
> Project [name#40,node#36]
>  CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
>   Subquery v
>Subquery h_src
> LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
> mapPartitions at ExistingRDD.scala:37
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-7275) Make LogicalRelation public

2015-05-07 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532159#comment-14532159
 ] 

Santiago M. Mola commented on SPARK-7275:
-

[~gweidner] I work on a project that extends Spark SQL with a richer data 
sources API. One of such extensions is the ability to push down a subtree of 
the logical plan in full to a data source. Data sources implementing this API 
must be able to inspect the LogicalPlan they're given, and that includes 
matching LogicalRelation. If a data source is in its own Java package (i.e. not 
org.apache.spark.sql) which is the usual case, it will not be able to match a 
LogicalRelation out of the box. Currently, I implemented a workaround by adding 
a public extractor IsLogicalRelation in org.apache.spark.sql package that 
proxies LogicalRelation to outsider packages... which is, of course, a ugly 
hack.

Note that LogicalRelation is the only element of the logical plan which is not 
public.

> Make LogicalRelation public
> ---
>
> Key: SPARK-7275
> URL: https://issues.apache.org/jira/browse/SPARK-7275
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Santiago M. Mola
>Priority: Minor
>
> It seems LogicalRelation is the only part of the LogicalPlan that is not 
> public. This makes it harder to work with full logical plans from third party 
> packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7275) Make LogicalRelation public

2015-04-30 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7275:
---

 Summary: Make LogicalRelation public
 Key: SPARK-7275
 URL: https://issues.apache.org/jira/browse/SPARK-7275
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


It seems LogicalRelation is the only part of the LogicalPlan that is not 
public. This makes it harder to work with full logical plans from third party 
packages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Labels: regression  (was: )

> [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
> -
>
> Key: SPARK-7088
> URL: https://issues.apache.org/jira/browse/SPARK-7088
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Critical
>  Labels: regression
>
> We're using some custom logical plans. We are now migrating from Spark 1.3.0 
> to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
> internal code, so we understand that. But now the ResolveReferences rule, 
> that used to work with third-party logical plans just does not work, without 
> any possible workaround that I'm aware other than just copying 
> ResolveReferences rule and using it with our own fix.
> The change in question is this section of code:
> {code}
> }.headOption.getOrElse { // Only handle first case, others will be 
> fixed on the next pass.
>   sys.error(
> s"""
>   |Failure when resolving conflicting references in Join:
>   |$plan
>   |
>   |Conflicting attributes: ${conflictingAttributes.mkString(",")}
>   """.stripMargin)
> }
> {code}
> Which causes the following error on analysis:
> {code}
> Failure when resolving conflicting references in Join:
> 'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
> c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
>  'Join Inner, None
>   Subquery l
>Subquery h
> Project [name#12,node#36]
>  CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
>   Subquery v
>Subquery h_src
> LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
> mapPartitions at ExistingRDD.scala:37
>   Subquery r
>Subquery h
> Project [name#40,node#36]
>  CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
>   Subquery v
>Subquery h_src
> LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
> mapPartitions at ExistingRDD.scala:37
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'FUNC1('l.node,'r.node) AS 
c2#37,'FUNC2('l.node,'r.node) AS c3#38,'FUNC3('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}

  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}


> [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
> -
>
> Key: SPARK-7088
> URL: https://issues.apache.org/jira/browse/SPARK-7088
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Critical
>
> We're using some custom logical plans. We are now migrating from Spark 1.3.0 
> to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
> internal code, so we understand that. But now the ResolveReferences rule, 
> that used to work with third-party logical plans just does not work, without 
> any possible workaround that I'm aware other than just copying 
> ResolveReferences rule and using it with our own fix.
> The change in question is this section of code:
> {code}
> }.headOption.getOrElse { // Only handle first case, others will be 
> fixed on the next pass.
>   sys.error(
> s"""
>   |Failure when resolving conflicting references in Join:
>   |$plan
>   |
>   |Conflicting attributes: ${conflictingAttributes.mkString(",")}
>   """.stripMargin)
> }
> {code}
> Which causes the following error on a

[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:
{code}
}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}
{code}

Which causes the following error on analysis:

{code}
Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
{code}

  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37



> [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
> -
>
> Key: SPARK-7088
> URL: https://issues.apache.org/jira/browse/SPARK-7088
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Critical
>
> We're using some custom logical plans. We are now migrating from Spark 1.3.0 
> to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
> internal code, so we understand that. But now the ResolveReferences rule, 
> that used to work with third-party logical plans just does not work, without 
> any possible workaround that I'm aware other than just copying 
> ResolveReferences rule and using it with our own fix.
> The change in question is this section of code:
> {code}
> }.headOption.getOrElse { // Only handle first case, others will be 
> fixed on the next pass.
>   sys.error(
> s"""
>   |Failure when resolving conflicting references in Join:
>   |$plan
>   |
>   |Conflicting attributes: ${conflictingAttributes.mkString(",")}
>   """.stripMargin)
> }
> {code}
> Which causes the following error 

[jira] [Created] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7088:
---

 Summary: [REGRESSION] Spark 1.3.1 breaks analysis of third-party 
logical plans
 Key: SPARK-7088
 URL: https://issues.apache.org/jira/browse/SPARK-7088
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.1
Reporter: Santiago M. Mola
Priority: Critical


We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand at. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-7088) [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans

2015-04-23 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-7088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-7088:

Description: 
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand that. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37


  was:
We're using some custom logical plans. We are now migrating from Spark 1.3.0 to 
1.3.1 and found a few incompatible API changes. All of them seem to be in 
internal code, so we understand at. But now the ResolveReferences rule, that 
used to work with third-party logical plans just does not work, without any 
possible workaround that I'm aware other than just copying ResolveReferences 
rule and using it with our own fix.

The change in question is this section of code:

}.headOption.getOrElse { // Only handle first case, others will be 
fixed on the next pass.
  sys.error(
s"""
  |Failure when resolving conflicting references in Join:
  |$plan
  |
  |Conflicting attributes: ${conflictingAttributes.mkString(",")}
  """.stripMargin)
}


Which causes the following error on analysis:

Failure when resolving conflicting references in Join:
'Project ['l.name,'r.name,'IS_DESCENDANT('l.node,'r.node) AS 
c2#37,'IS_DESCENDANT_OR_SELF('l.node,'r.node) AS 
c3#38,'IS_PARENT('r.node,'l.node) AS c4#39]
 'Join Inner, None
  Subquery l
   Subquery h
Project [name#12,node#36]
 CustomPlan H, u, (p#13L = s#14L), [ord#15 ASC], IS NULL p#13L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#12,p#13L,s#14L,ord#15], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37
  Subquery r
   Subquery h
Project [name#40,node#36]
 CustomPlan H, u, (p#41L = s#42L), [ord#43 ASC], IS NULL pred#41L, node#36
  Subquery v
   Subquery h_src
LogicalRDD [name#40,p#41L,s#42L,ord#43], MapPartitionsRDD[1] at 
mapPartitions at ExistingRDD.scala:37



> [REGRESSION] Spark 1.3.1 breaks analysis of third-party logical plans
> -
>
> Key: SPARK-7088
> URL: https://issues.apache.org/jira/browse/SPARK-7088
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.1
>Reporter: Santiago M. Mola
>Priority: Critical
>
> We're using some custom logical plans. We are now migrating from Spark 1.3.0 
> to 1.3.1 and found a few incompatible API changes. All of them seem to be in 
> internal code, so we understand that. But now the ResolveReferences rule, 
> that used to work with third-party logical plans just does not work, without 
> any possible workaround that I'm aware other than just copying 
> ResolveReferences rule and using it with our own fix.
> The change in question is this section of code:
> }.headOption.getOrElse { // Only handle first case, others will be 
> fixed on the next pass.
>   sys.error(
> s"""
>   |Failure when resolving conflicting references in Join:
>   |$plan
>   |
>   |Conflicting attributes: ${conflictingAttributes.mkString(",")}
>   """.stripMargin)
> }
> Which causes the following error on analysis:
> Failure when resolving conflic

[jira] [Created] (SPARK-7034) Support escaped double quotes on data source options

2015-04-21 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7034:
---

 Summary: Support escaped double quotes on data source options
 Key: SPARK-7034
 URL: https://issues.apache.org/jira/browse/SPARK-7034
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Santiago M. Mola
Priority: Minor


Currently, this is not supported:

CREATE TEMPORARY TABLE t
USING my.data.source
OPTIONS (
  myFancyOption "with \"escaped\" double quotes"
);

it will produce a parsing error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-7012) Add support for NOT NULL modifier for column definitions on DDLParser

2015-04-20 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-7012:
---

 Summary: Add support for NOT NULL modifier for column definitions 
on DDLParser
 Key: SPARK-7012
 URL: https://issues.apache.org/jira/browse/SPARK-7012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Add support for NOT NULL modifier for column definitions on DDLParser. This 
would add support for the following syntax:

CREATE TEMPORARY TABLE (field INTEGER NOT NULL) ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6874) Add support for SQL:2003 array type declaration syntax

2015-04-12 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6874:
---

 Summary: Add support for SQL:2003 array type declaration syntax
 Key: SPARK-6874
 URL: https://issues.apache.org/jira/browse/SPARK-6874
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


As of SQL:2003, arrays are standard SQL types, However, declaration syntax 
differs from Spark's CQL-like syntax. Examples of standard syntax:

BIGINT ARRAY
BIGINT ARRAY[100]
BIGINT ARRAY[100] ARRAY[200]

It would be great to have support standard syntax here.

Some additional details that this addition should have IMO:
- Forbit mixed syntax such as ARRAY ARRAY[100]
- Ignore the maximum capacity (ARRAY[N]) but allow it to be specified. This 
seems to be what others (i.e. PostgreSQL) are doing.
ARRAY ARRAY[100]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6863) Formatted list broken on Hive compatibility section of SQL programming guide

2015-04-11 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6863:
---

 Summary: Formatted list broken on Hive compatibility section of 
SQL programming guide
 Key: SPARK-6863
 URL: https://issues.apache.org/jira/browse/SPARK-6863
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Trivial


Formatted list broken on Hive compatibility section of SQL programming guide. 
It does not appear as a list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6744) Add support for CROSS JOIN syntax

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6744:
---

 Summary: Add support for CROSS JOIN syntax
 Key: SPARK-6744
 URL: https://issues.apache.org/jira/browse/SPARK-6744
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
 Environment: Add support for the standard CROSS JOIN syntax.
Reporter: Santiago M. Mola
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6743:
---

 Summary: Join with empty projection on one side produces invalid 
results
 Key: SPARK-6743
 URL: https://issues.apache.org/jira/browse/SPARK-6743
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola


{code:java}
val sqlContext = new SQLContext(sc)
val tab0 = sc.parallelize(Seq(
  (83,0,38),
  (26,0,79),
  (43,81,24)
))
sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
"tab0")
sqlContext.cacheTable("tab0")   
val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP BY 
tab0._2, cor0._2")
val result1 = df1.collect()
val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY cor0._2")
val result2 = df2.collect()
val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
val result3 = df3.collect()
{code}

Given the previous code, result2 equals to Row(43), Row(83), Row(26), which is 
wrong. These results correspond to cor0._1, instead of cor0._2. Correct results 
would be Row(0), Row(81), which are ok for the third query. The first query 
also produces valid results, and the only difference is that the left side of 
the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6743) Join with empty projection on one side produces invalid results

2015-04-07 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6743:

Priority: Critical  (was: Major)

> Join with empty projection on one side produces invalid results
> ---
>
> Key: SPARK-6743
> URL: https://issues.apache.org/jira/browse/SPARK-6743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Santiago M. Mola
>Priority: Critical
>
> {code:java}
> val sqlContext = new SQLContext(sc)
> val tab0 = sc.parallelize(Seq(
>   (83,0,38),
>   (26,0,79),
>   (43,81,24)
> ))
> sqlContext.registerDataFrameAsTable(sqlContext.createDataFrame(tab0), 
> "tab0")
> sqlContext.cacheTable("tab0")   
> val df1 = sqlContext.sql("SELECT tab0._2, cor0._2 FROM tab0, tab0 cor0 GROUP 
> BY tab0._2, cor0._2")
> val result1 = df1.collect()
> val df2 = sqlContext.sql("SELECT cor0._2 FROM tab0, tab0 cor0 GROUP BY 
> cor0._2")
> val result2 = df2.collect()
> val df3 = sqlContext.sql("SELECT cor0._2 FROM tab0 cor0 GROUP BY cor0._2")
> val result3 = df3.collect()
> {code}
> Given the previous code, result2 equals to Row(43), Row(83), Row(26), which 
> is wrong. These results correspond to cor0._1, instead of cor0._2. Correct 
> results would be Row(0), Row(81), which are ok for the third query. The first 
> query also produces valid results, and the only difference is that the left 
> side of the join is not empty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6741) Add support for SELECT ALL syntax

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6741:
---

 Summary: Add support for SELECT ALL syntax
 Key: SPARK-6741
 URL: https://issues.apache.org/jira/browse/SPARK-6741
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Support SELECT ALL syntax (equivalent to SELECT, without DISTINCT).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6740) SQL operator and condition precedence is not honoured

2015-04-07 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6740:
---

 Summary: SQL operator and condition precedence is not honoured
 Key: SPARK-6740
 URL: https://issues.apache.org/jira/browse/SPARK-6740
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola


The following query from the SQL Logic Test suite fails to parse:

SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT ( - _2 + - 39 ) IS NULL

while the following (equivalent) does parse correctly:

SELECT DISTINCT * FROM t1 AS cor0 WHERE NOT (( - _2 + - 39 ) IS NULL)

SQLite, MySQL and Oracle (and probably most SQL implementations) define IS with 
higher precedence than NOT, so the first query is valid and well-defined.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6611) Add support for INTEGER as synonym of INT to DDLParser

2015-03-30 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6611:
---

 Summary: Add support for INTEGER as synonym of INT to DDLParser
 Key: SPARK-6611
 URL: https://issues.apache.org/jira/browse/SPARK-6611
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Santiago M. Mola
Priority: Minor


Add support for INTEGER as synonym of INT to DDLParser.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola resolved SPARK-6410.
-
Resolution: Not a Problem

This has something to do with the incremental compiler. I got it working after 
running "sbt clean".

> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
> Attachments: output.log
>
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Description: 
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

  was:
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3


> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
> Attachments: output.log
>
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3  1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Attachment: output.log

Full error log.

> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
> Attachments: output.log
>
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Comment: was deleted

(was: Full error log.)

> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
> Attachments: output.log
>
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Environment: 



  was:
$ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3



> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Labels: build-failure  (was: )

> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Component/s: SQL

> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-6410:

Description: 
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]

$ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

  was:
$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]



> Build error on Windows: polymorphic expression cannot be instantiated to 
> expected type
> --
>
> Key: SPARK-6410
> URL: https://issues.apache.org/jira/browse/SPARK-6410
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0
> Environment: 
>Reporter: Santiago M. Mola
>  Labels: build-failure
>
> $ bash build/sbt -Phadoop-2.3 assembly
> [...]
> [error] 
> C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
>  polymorphic expression cannot be instantiated to expected type;
> [error]  found   : [T(in method 
> apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
> [error]  required: 
> org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
> functionToUdfBuilder)]
> [error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, 
> T]): ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
> [error]   
>   ^
> [...]
> $ uname -a
> CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin
> $ java -version
> java version "1.7.0_75"
> Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)
> $ scala -version
> Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
> $ build/zinc-0.3.5.3/bin/zinc -version
> zinc (scala incremental compiler) 0.3.5.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6410) Build error on Windows: polymorphic expression cannot be instantiated to expected type

2015-03-19 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-6410:
---

 Summary: Build error on Windows: polymorphic expression cannot be 
instantiated to expected type
 Key: SPARK-6410
 URL: https://issues.apache.org/jira/browse/SPARK-6410
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.0
 Environment: $ uname -a
CYGWIN_NT-6.3 WDFN30003681A 1.7.35(0.287/5/3) 2015-03-04 12:09 x86_64 Cygwin

$ java -version
java version "1.7.0_75"
Java(TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

$ scala -version
Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL

$ build/zinc-0.3.5.3/bin/zinc -version
zinc (scala incremental compiler) 0.3.5.3

Reporter: Santiago M. Mola


$ bash build/sbt -Phadoop-2.3 assembly
[...]
[error] 
C:\Users\\dev\repos\spark\sql\catalyst\src\main\scala\org\apache\spark\sql\catalyst\dsl\package.scala:314:
 polymorphic expression cannot be instantiated to expected type;
[error]  found   : [T(in method 
apply)]org.apache.spark.sql.catalyst.dsl.ScalaUdfBuilder[T(in method apply)]
[error]  required: 
org.apache.spark.sql.catalyst.dsl.package.ScalaUdfBuilder[T(in method 
functionToUdfBuilder)]
[error]   implicit def functionToUdfBuilder[T: TypeTag](func: Function1[_, T]): 
ScalaUdfBuilder[T] = ScalaUdfBuilder(func)
[error] 
^
[...]




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6320) Adding new query plan strategy to SQLContext

2015-03-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368683#comment-14368683
 ] 

Santiago M. Mola commented on SPARK-6320:
-

[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]]'s 
use case and might improve code organization of the core strategies in the 
future.

> Adding new query plan strategy to SQLContext
> 
>
> Key: SPARK-6320
> URL: https://issues.apache.org/jira/browse/SPARK-6320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Youssef Hatem
>Priority: Minor
>
> Hi,
> I would like to add a new strategy to {{SQLContext}}. To do this I created a 
> new class which extends {{Strategy}}. In my new class I need to call 
> {{planLater}} function. However this method is defined in {{SparkPlanner}} 
> (which itself inherits the method from {{QueryPlanner}}).
> To my knowledge the only way to make {{planLater}} function visible to my new 
> strategy is to define my strategy inside another class that extends 
> {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will 
> have to extend the {{SQLContext}} such that I can override the {{planner}} 
> field with the new {{Planner}} class I created.
> It seems that this is a design problem because adding a new strategy seems to 
> require extending {{SQLContext}} (unless I am doing it wrong and there is a 
> better way to do it).
> Thanks a lot,
> Youssef



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6320) Adding new query plan strategy to SQLContext

2015-03-19 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368683#comment-14368683
 ] 

Santiago M. Mola edited comment on SPARK-6320 at 3/19/15 8:12 AM:
--

[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]'s 
use case and might improve code organization of the core strategies in the 
future.


was (Author: smolav):
[~marmbrus] We could change strategies so that they take a SparkPlanner in 
their constructor. This should provide enough flexibility for [~H.Youssef]]'s 
use case and might improve code organization of the core strategies in the 
future.

> Adding new query plan strategy to SQLContext
> 
>
> Key: SPARK-6320
> URL: https://issues.apache.org/jira/browse/SPARK-6320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Youssef Hatem
>Priority: Minor
>
> Hi,
> I would like to add a new strategy to {{SQLContext}}. To do this I created a 
> new class which extends {{Strategy}}. In my new class I need to call 
> {{planLater}} function. However this method is defined in {{SparkPlanner}} 
> (which itself inherits the method from {{QueryPlanner}}).
> To my knowledge the only way to make {{planLater}} function visible to my new 
> strategy is to define my strategy inside another class that extends 
> {{SparkPlanner}} and inherits {{planLater}} as a result, by doing so I will 
> have to extend the {{SQLContext}} such that I can override the {{planner}} 
> field with the new {{Planner}} class I created.
> It seems that this is a design problem because adding a new strategy seems to 
> require extending {{SQLContext}} (unless I am doing it wrong and there is a 
> better way to do it).
> Thanks a lot,
> Youssef



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6397) Check the missingInput simply

2015-03-18 Thread Santiago M. Mola (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367437#comment-14367437
 ] 

Santiago M. Mola commented on SPARK-6397:
-

I think a proper title would be: Override QueryPlan.missingInput when necessary 
and rely on it CheckAnalysis.
And description: Currently, some LogicalPlans do not override missingInput, but 
they should. Then, the lack of proper missingInput implementations leaks to 
CheckAnalysis.

(I'm about to create a pull request that fixes this problem in some more places)



> Check the missingInput simply
> -
>
> Key: SPARK-6397
> URL: https://issues.apache.org/jira/browse/SPARK-6397
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Yadong Qi
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-4799) Spark should not rely on local host being resolvable on every node

2014-12-09 Thread Santiago M. Mola (JIRA)
Santiago M. Mola created SPARK-4799:
---

 Summary: Spark should not rely on local host being resolvable on 
every node
 Key: SPARK-4799
 URL: https://issues.apache.org/jira/browse/SPARK-4799
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.1.0
 Environment: Tested a Spark+Mesos cluster on top of Docker to 
reproduce the issue.
Reporter: Santiago M. Mola


Spark fails when a node hostname is not resolvable by other nodes.

See an example trace:

{code}
14/12/09 17:02:41 ERROR SendingConnection: Error connecting to 
27e434cf36ac:35093
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:127)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:644)
at 
org.apache.spark.network.SendingConnection.connect(Connection.scala:299)
at 
org.apache.spark.network.ConnectionManager.run(ConnectionManager.scala:278)
at 
org.apache.spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:139)
{code}

The relevant code is here:
https://github.com/apache/spark/blob/bcb5cdad614d4fce43725dfec3ce88172d2f8c11/core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala#L170

{code}
val id = new ConnectionManagerId(Utils.localHostName, 
serverChannel.socket.getLocalPort)
{code}

This piece of code should use the host IP with Utils.localIpAddress or a method 
that acknowleges user settings (e.g. SPARK_LOCAL_IP). Since I cannot think 
about a use case for using hostname here, I'm creating a PR with the former 
solution, but if you think the later is better, I'm willing to create a new PR 
with a more elaborate fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >