[jira] [Created] (SPARK-23600) conda_panda_example test fails to import panda lib with Spark 2.3

2018-03-05 Thread Supreeth Sharma (JIRA)
Supreeth Sharma created SPARK-23600:
---

 Summary: conda_panda_example test fails to import panda lib with 
Spark 2.3
 Key: SPARK-23600
 URL: https://issues.apache.org/jira/browse/SPARK-23600
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.3.0
 Environment: ambari-server --version 2.7.0.2-64

HDP-3.0.0.2-132
Reporter: Supreeth Sharma
 Fix For: 2.3.0


With Spark2.3, conda panda test is failing to import panda.

python version: Python 2.7.5

1) Create Requirement file.
virtual_env_type : Native
{code:java}
packaging==16.8
panda==0.3.1
pyparsing==2.1.10
requests==2.13.0
six==1.10.0
numpy==1.12.0
pandas==0.19.2
python-dateutil==2.6.0
pytz==2016.10
{code}
virtual_env_type : conda
{code:java}
mkl=2017.0.1=0
numpy=1.12.0=py27_0
openssl=1.0.2k=0
pandas=0.19.2=np112py27_1
pip=9.0.1=py27_1
python=2.7.13=0
python-dateutil=2.6.0=py27_0
pytz=2016.10=py27_0
readline=6.2=2
setuptools=27.2.0=py27_0
six=1.10.0=py27_0
sqlite=3.13.0=0
tk=8.5.18=0
wheel=0.29.0=py27_0
zlib=1.2.8=3
{code}
2) Run conda panda test
{code:java}
spark-submit  --master yarn-client --jars 
/usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.3.0.0.2-132.jar --conf 
spark.pyspark.virtualenv.enabled=true --conf 
spark.pyspark.virtualenv.type=native --conf 
spark.pyspark.virtualenv.requirements=/tmp/requirements.txt --conf 
spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv 
/hwqe/hadoopqe/tests/spark/data/conda_panda_example.py   2>&1 | tee 
/tmp/1/Spark_clientLogs/pyenv_conda_panda_example_native_yarn-client.log
{code}
3) Application fail to import panda.
{code:java}
2018-03-05 13:43:31,493|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO 
YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning 
after reached minRegisteredResourcesRatio: 0.8
2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|Traceback (most recent call 
last):
2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|File 
"/hwqe/hadoopqe/tests/spark/data/conda_panda_example.py", line 5, in 
2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|import pandas as pd
2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|ImportError: No module named 
pandas
2018-03-05 13:43:31,547|INFO|MainThread|machine.py:167 - 
run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO 
BlockManagerMasterEndpoint: Registering block manager 
ctr-e138-1518143905142-67599-01-05.hwx.site:44861 with 366.3 MB RAM, 
BlockManagerId(2, ctr-e138-1518143905142-67599-01-05.hwx.site, 44861, 
None){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23496) Locality of coalesced partitions can be severely skewed by the order of input partitions

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23496.
---
   Resolution: Fixed
 Assignee: Ala Luszczak
Fix Version/s: 2.4.0

> Locality of coalesced partitions can be severely skewed by the order of input 
> partitions
> 
>
> Key: SPARK-23496
> URL: https://issues.apache.org/jira/browse/SPARK-23496
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ala Luszczak
>Assignee: Ala Luszczak
>Priority: Major
> Fix For: 2.4.0
>
>
> Example:
> Consider RDD "R" with 100 partitions, half of which have locality preference 
> "hostA" and half have "hostB".
>  * Assume odd-numbered input partitions of R prefer "hostA" and even-numbered 
> prefer "hostB". Then R.coalesce(50) will have 25 partitions with preference 
> "hostA" and 25 with "hostB" (even distribution).
>  * Assume partitions with index 0-49 of R prefer "hostA" and partitions with 
> index 50-99 prefer "hostB". Then R.coalesce(50) will have 49 partitions with 
> "hostA" and 1 with "hostB" (extremely skewed distribution).
>  
> The algorithm in {{DefaultPartitionCoalescer.setupGroups}} is responsible for 
> picking preferred locations for coalesced partitions. It analyzes the 
> preferred locations of input partitions. It starts by trying to create one 
> partition for each unique location in the input. However, if the the 
> requested number of coalesced partitions is higher that the number of unique 
> locations, it has to pick duplicate locations.
> Currently, the duplicate locations are picked by iterating over the input 
> partitions in order, and copying their preferred locations to coalesced 
> partitions. If the input partitions are clustered by location, this can 
> result in severe skew.
> Instead of iterating over the list of input partitions in order, we should 
> pick them at random.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23599) The UUID() expression is too non-deterministic

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23599:
-

 Summary: The UUID() expression is too non-deterministic
 Key: SPARK-23599
 URL: https://issues.apache.org/jira/browse/SPARK-23599
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


The current {{Uuid()}} expression uses {{java.util.UUID.randomUUID}} for UUID 
generation. There are a couple of major problems with this:
- It is non-deterministic across task retries. This breaks Spark's processing 
model, and this will to very hard to trace bugs, like non-deterministic 
shuffles, duplicates and missing rows.
- It uses a single secure random for UUID generation. This uses a single JVM 
wide lock, and this can lead to lock contention and other performance problems.

We should move to something that is deterministic between retries. This can be 
done by using seeded PRNGs for which we set the seed during planning. It is 
important here to use a PRNG that provides enough entropy for creating a proper 
UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23599) The UUID() expression is too non-deterministic

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23599:
--
Priority: Critical  (was: Major)

> The UUID() expression is too non-deterministic
> --
>
> Key: SPARK-23599
> URL: https://issues.apache.org/jira/browse/SPARK-23599
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Critical
>
> The current {{Uuid()}} expression uses {{java.util.UUID.randomUUID}} for UUID 
> generation. There are a couple of major problems with this:
> - It is non-deterministic across task retries. This breaks Spark's processing 
> model, and this will to very hard to trace bugs, like non-deterministic 
> shuffles, duplicates and missing rows.
> - It uses a single secure random for UUID generation. This uses a single JVM 
> wide lock, and this can lead to lock contention and other performance 
> problems.
> We should move to something that is deterministic between retries. This can 
> be done by using seeded PRNGs for which we set the seed during planning. It 
> is important here to use a PRNG that provides enough entropy for creating a 
> proper UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 from an inner-class of a sub-class that is loaded by a different class-loader 
(after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]
In the generated function body we call the _append_ method.|

Now, the _addNewFunction_ method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
_IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 

[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 from an inner-class of a sub-class that is loaded by a different class-loader 
(after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator
|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]In
 the generated function body we call the _append_ method.

Now, the _addNewFunction_ method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
_IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 

[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 from an inner-class of a sub-class that is loaded by a different class-loader 
(after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]In
 the generated function body we call the _append_ method.|

Now, the _addNewFunction_ method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
_IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 

[jira] [Assigned] (SPARK-23586) Add interpreted execution for WrapOption expression

2018-03-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23586:


Assignee: Apache Spark

> Add interpreted execution for WrapOption expression
> ---
>
> Key: SPARK-23586
> URL: https://issues.apache.org/jira/browse/SPARK-23586
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-23586) Add interpreted execution for WrapOption expression

2018-03-05 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-23586:


Assignee: (was: Apache Spark)

> Add interpreted execution for WrapOption expression
> ---
>
> Key: SPARK-23586
> URL: https://issues.apache.org/jira/browse/SPARK-23586
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23586) Add interpreted execution for WrapOption expression

2018-03-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386069#comment-16386069
 ] 

Apache Spark commented on SPARK-23586:
--

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/20736

> Add interpreted execution for WrapOption expression
> ---
>
> Key: SPARK-23586
> URL: https://issues.apache.org/jira/browse/SPARK-23586
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 from an inner-class of a sub-class that is loaded by a different class-loader 
(after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]

Now, this method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
_IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 

[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 from an inner-class of a sub-class that is loaded by a different class-loader 
(after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]

Now, this method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
___IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 

[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Vogelbacher updated SPARK-23598:
--
Description: 
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{noformat}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat}
After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 __ from an inner-class of a sub-class that is loaded by a different 
class-loader (after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.

Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]

Now, this method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
___IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 

  was:
Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{code:java}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 

[jira] [Created] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec

2018-03-05 Thread David Vogelbacher (JIRA)
David Vogelbacher created SPARK-23598:
-

 Summary: WholeStageCodegen can lead to IllegalAccessError  calling 
append for HashAggregateExec
 Key: SPARK-23598
 URL: https://issues.apache.org/jira/browse/SPARK-23598
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: David Vogelbacher


Got the following stacktrace for a large QueryPlan using WholeStageCodeGen:
{code:java}
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V
 from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
 Source)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
{code}

After disabling codegen, everything works.

The root cause seems to be that we are trying to call the protected _append_ 
method of 
[BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68]
 __ from an inner-class of a sub-class that is loaded by a different 
class-loader (after codegen compilation).

[https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] 
states that a protected method _R_ can be accessed only if one of the following 
two conditions is fulfilled:
 # R is protected and is declared in a class C, and D is either a subclass of C 
or C itself. Furthermore, if R is not static, then the symbolic reference to R 
must contain a symbolic reference to a class T, such that T is either a 
subclass of D, a superclass of D, or D itself.
 # R is either protected or has default access (that is, neither public nor 
protected nor private), and is declared by a class in the same run-time package 
as D.

2.) doesn't apply as we have loaded the class with a different class loader 
(and are in a different package) and 1.) doesn't apply because we are 
apparently trying to call the method from an inner class of a subclass of 
_BufferedRowIterator_.



Looking at the Code path of _WholeStageCodeGen_, the following happens:
 # In 
[WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527],
 we create the subclass of _BufferedRowIterator_, along with a _processNext_ 
method for processing the output of the child plan.
 # In the child, which is a 
[HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517],
 we create the method which shows up at the top of the stack trace (called 
_doAggregateWithKeysOutput_ )
 # We add this method to the compiled code invoking _addNewFunction_ of 
[CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]

Now, this method states that:
{noformat}
If the code for the `OuterClass` grows too large, the function will be inlined 
into a new private, inner class
{noformat}
This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into 
a new private inner class. Thus, it doesn't have access to the protected 
_append_ method anymore but still tries to call it, which results in the 
___IllegalAccessError._ 

Possible fixes:
 * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_
 * Make the _append_ method public
 * Re-declare the _append_ method in the generated subclass (just invoking 
_super_). This way, inner classes should have access to it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23597) Audit Spark SQL code base for non-interpreted expressions

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23597:
-

 Summary: Audit Spark SQL code base for non-interpreted expressions
 Key: SPARK-23597
 URL: https://issues.apache.org/jira/browse/SPARK-23597
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


We want to eliminate expressions that do not provide an interpreted execution 
path from the code base. The goal of this ticket is to check if there any other 
besides the ones being addressed by SPARK-23580.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23596) Modify Dataset test harness to include interpreted execution

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23596:
-

 Summary: Modify Dataset test harness to include interpreted 
execution
 Key: SPARK-23596
 URL: https://issues.apache.org/jira/browse/SPARK-23596
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


We should modify the Dataset test harness to also test the interpreted code 
paths. This task can be started as soon as a significant subset of the object 
related Expressions provides an interpreted fallback.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23595) Add interpreted execution for ValidateExternalType expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23595:
-

 Summary: Add interpreted execution for ValidateExternalType 
expression
 Key: SPARK-23595
 URL: https://issues.apache.org/jira/browse/SPARK-23595
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23594) Add interpreted execution for GetExternalRowField expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23594:
-

 Summary: Add interpreted execution for GetExternalRowField 
expression
 Key: SPARK-23594
 URL: https://issues.apache.org/jira/browse/SPARK-23594
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23593:
-

 Summary: Add interpreted execution for InitializeJavaBean 
expression
 Key: SPARK-23593
 URL: https://issues.apache.org/jira/browse/SPARK-23593
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23592) Add interpreted execution for DecodeUsingSerializer expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23592:
-

 Summary: Add interpreted execution for DecodeUsingSerializer 
expression
 Key: SPARK-23592
 URL: https://issues.apache.org/jira/browse/SPARK-23592
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23591) Add interpreted execution for EncodeUsingSerializer expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23591:
-

 Summary: Add interpreted execution for EncodeUsingSerializer 
expression
 Key: SPARK-23591
 URL: https://issues.apache.org/jira/browse/SPARK-23591
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23590) Add interpreted execution for CreateExternalRow

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23590:
-

 Summary: Add interpreted execution for CreateExternalRow
 Key: SPARK-23590
 URL: https://issues.apache.org/jira/browse/SPARK-23590
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23590) Add interpreted execution for CreateExternalRow expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23590:
--
Summary: Add interpreted execution for CreateExternalRow expression  (was: 
Add interpreted execution for CreateExternalRow)

> Add interpreted execution for CreateExternalRow expression
> --
>
> Key: SPARK-23590
> URL: https://issues.apache.org/jira/browse/SPARK-23590
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23589) Add interpreted execution for ExternalMapToCatalyst expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23589:
-

 Summary: Add interpreted execution for ExternalMapToCatalyst 
expression
 Key: SPARK-23589
 URL: https://issues.apache.org/jira/browse/SPARK-23589
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23588) Add interpreted execution for CatalystToExternalMap expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23588:
-

 Summary: Add interpreted execution for CatalystToExternalMap 
expression
 Key: SPARK-23588
 URL: https://issues.apache.org/jira/browse/SPARK-23588
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23587) Add interpreted execution for MapObjects expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23587:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-23580

> Add interpreted execution for MapObjects expression
> ---
>
> Key: SPARK-23587
> URL: https://issues.apache.org/jira/browse/SPARK-23587
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>
> Add interpreted execution for {{MapObjects}} expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23587) Add interpreted execution for MapObjects expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23587:
-

 Summary: Add interpreted execution for MapObjects expression
 Key: SPARK-23587
 URL: https://issues.apache.org/jira/browse/SPARK-23587
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


Add interpreted execution for {{MapObjects}} expression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23586) Add interpreted execution for WrapOption expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23586:
-

 Summary: Add interpreted execution for WrapOption expression
 Key: SPARK-23586
 URL: https://issues.apache.org/jira/browse/SPARK-23586
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23585) Add interpreted execution for UnwrapOption expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23585:
--
Summary: Add interpreted execution for UnwrapOption expression  (was: Add 
interpreted execition for UnwrapOption expression)

> Add interpreted execution for UnwrapOption expression
> -
>
> Key: SPARK-23585
> URL: https://issues.apache.org/jira/browse/SPARK-23585
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23582) Add interpreted execution to StaticInvoke expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23582:
--
Summary: Add interpreted execution to StaticInvoke expression  (was: Add 
interpreted execution to StaticInvoke)

> Add interpreted execution to StaticInvoke expression
> 
>
> Key: SPARK-23582
> URL: https://issues.apache.org/jira/browse/SPARK-23582
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23585) Add interpreted execition for UnwrapOption expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23585:
--
Summary: Add interpreted execition for UnwrapOption expression  (was: Add 
interpreted mode for UnwrapOption expression)

> Add interpreted execition for UnwrapOption expression
> -
>
> Key: SPARK-23585
> URL: https://issues.apache.org/jira/browse/SPARK-23585
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23584) Add interpreted execution to NewInstance expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23584:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-23580

> Add interpreted execution to NewInstance expression
> ---
>
> Key: SPARK-23584
> URL: https://issues.apache.org/jira/browse/SPARK-23584
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23585) Add interpreted mode for UnwrapOption expression

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23585:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-23580

> Add interpreted mode for UnwrapOption expression
> 
>
> Key: SPARK-23585
> URL: https://issues.apache.org/jira/browse/SPARK-23585
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23584) Add interpreted execution to NewInstance expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23584:
-

 Summary: Add interpreted execution to NewInstance expression
 Key: SPARK-23584
 URL: https://issues.apache.org/jira/browse/SPARK-23584
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23585) Add interpreted mode for UnwrapOption expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23585:
-

 Summary: Add interpreted mode for UnwrapOption expression
 Key: SPARK-23585
 URL: https://issues.apache.org/jira/browse/SPARK-23585
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23582) Add interpreted execution to StaticInvoke

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23582:
-

 Summary: Add interpreted execution to StaticInvoke
 Key: SPARK-23582
 URL: https://issues.apache.org/jira/browse/SPARK-23582
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23583) Add interpreted execution to Invoke expression

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23583:
-

 Summary: Add interpreted execution to Invoke expression
 Key: SPARK-23583
 URL: https://issues.apache.org/jira/browse/SPARK-23583
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23581) Add an interpreted version of GenerateUnsafeProjection

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23581:
-

 Summary: Add an interpreted version of GenerateUnsafeProjection
 Key: SPARK-23581
 URL: https://issues.apache.org/jira/browse/SPARK-23581
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


GenerateUnsafeProjection should have an interpreted cousin. See the parent 
ticket for the motivation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections

2018-03-05 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23580:
-

 Summary: Interpreted mode fallback should be implemented for all 
expressions & projections
 Key: SPARK-23580
 URL: https://issues.apache.org/jira/browse/SPARK-23580
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell


Spark SQL currently does not support interpreted mode for all expressions and 
projections. This is a problem for scenario's where were code generation does 
not work, or blows past the JVM class limits. We currently cannot gracefully 
fallback.

This ticket is an umbrella to fix this class of problem in Spark SQL. This work 
can be divided into two main area's:
- Add interpreted versions for all dataset related expressions.
- Add an interpreted version of {{GenerateUnsafeProjection}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore

2018-03-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386010#comment-16386010
 ] 

Apache Spark commented on SPARK-23510:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/20734

> Support read data from Hive 2.2 and Hive 2.3 metastore
> --
>
> Key: SPARK-23510
> URL: https://issues.apache.org/jira/browse/SPARK-23510
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23546) Refactor non-stateful methods/values in CodegenContext

2018-03-05 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23546.
---
  Resolution: Fixed
Assignee: Kazuaki Ishizaki
Target Version/s: 2.3.0

> Refactor non-stateful methods/values in CodegenContext
> --
>
> Key: SPARK-23546
> URL: https://issues.apache.org/jira/browse/SPARK-23546
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Major
>
> Current {{CodegenContext}} class has immutable value or method without 
> mutable state, too.
> This refactoring moves them to {{CodeGenerator}} object class which can be 
> accessed from anywhere without an instantiated {{CodegenContext}} in the 
> program.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory

2018-03-05 Thread liuxian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liuxian resolved SPARK-23516.
-
Resolution: Invalid

> I think it is unnecessary to transfer unroll memory to storage memory 
> --
>
> Key: SPARK-23516
> URL: https://issues.apache.org/jira/browse/SPARK-23516
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: liuxian
>Priority: Minor
>
> In fact, unroll memory is also storage memory,so I think it is unnecessary to 
> release unroll memory really, and then to get storage memory again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2