[jira] [Created] (SPARK-23600) conda_panda_example test fails to import panda lib with Spark 2.3
Supreeth Sharma created SPARK-23600: --- Summary: conda_panda_example test fails to import panda lib with Spark 2.3 Key: SPARK-23600 URL: https://issues.apache.org/jira/browse/SPARK-23600 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.3.0 Environment: ambari-server --version 2.7.0.2-64 HDP-3.0.0.2-132 Reporter: Supreeth Sharma Fix For: 2.3.0 With Spark2.3, conda panda test is failing to import panda. python version: Python 2.7.5 1) Create Requirement file. virtual_env_type : Native {code:java} packaging==16.8 panda==0.3.1 pyparsing==2.1.10 requests==2.13.0 six==1.10.0 numpy==1.12.0 pandas==0.19.2 python-dateutil==2.6.0 pytz==2016.10 {code} virtual_env_type : conda {code:java} mkl=2017.0.1=0 numpy=1.12.0=py27_0 openssl=1.0.2k=0 pandas=0.19.2=np112py27_1 pip=9.0.1=py27_1 python=2.7.13=0 python-dateutil=2.6.0=py27_0 pytz=2016.10=py27_0 readline=6.2=2 setuptools=27.2.0=py27_0 six=1.10.0=py27_0 sqlite=3.13.0=0 tk=8.5.18=0 wheel=0.29.0=py27_0 zlib=1.2.8=3 {code} 2) Run conda panda test {code:java} spark-submit --master yarn-client --jars /usr/hdp/current/hadoop-client/lib/hadoop-lzo-0.6.0.3.0.0.2-132.jar --conf spark.pyspark.virtualenv.enabled=true --conf spark.pyspark.virtualenv.type=native --conf spark.pyspark.virtualenv.requirements=/tmp/requirements.txt --conf spark.pyspark.virtualenv.bin.path=/usr/bin/virtualenv /hwqe/hadoopqe/tests/spark/data/conda_panda_example.py 2>&1 | tee /tmp/1/Spark_clientLogs/pyenv_conda_panda_example_native_yarn-client.log {code} 3) Application fail to import panda. {code:java} 2018-03-05 13:43:31,493|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|Traceback (most recent call last): 2018-03-05 13:43:31,527|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|File "/hwqe/hadoopqe/tests/spark/data/conda_panda_example.py", line 5, in 2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|import pandas as pd 2018-03-05 13:43:31,528|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|ImportError: No module named pandas 2018-03-05 13:43:31,547|INFO|MainThread|machine.py:167 - run()||GUID=a3cb88f7-bf55-4d9e-9cfe-3e44eae3a72b|18/03/05 13:43:31 INFO BlockManagerMasterEndpoint: Registering block manager ctr-e138-1518143905142-67599-01-05.hwx.site:44861 with 366.3 MB RAM, BlockManagerId(2, ctr-e138-1518143905142-67599-01-05.hwx.site, 44861, None){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23496) Locality of coalesced partitions can be severely skewed by the order of input partitions
[ https://issues.apache.org/jira/browse/SPARK-23496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-23496. --- Resolution: Fixed Assignee: Ala Luszczak Fix Version/s: 2.4.0 > Locality of coalesced partitions can be severely skewed by the order of input > partitions > > > Key: SPARK-23496 > URL: https://issues.apache.org/jira/browse/SPARK-23496 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ala Luszczak >Assignee: Ala Luszczak >Priority: Major > Fix For: 2.4.0 > > > Example: > Consider RDD "R" with 100 partitions, half of which have locality preference > "hostA" and half have "hostB". > * Assume odd-numbered input partitions of R prefer "hostA" and even-numbered > prefer "hostB". Then R.coalesce(50) will have 25 partitions with preference > "hostA" and 25 with "hostB" (even distribution). > * Assume partitions with index 0-49 of R prefer "hostA" and partitions with > index 50-99 prefer "hostB". Then R.coalesce(50) will have 49 partitions with > "hostA" and 1 with "hostB" (extremely skewed distribution). > > The algorithm in {{DefaultPartitionCoalescer.setupGroups}} is responsible for > picking preferred locations for coalesced partitions. It analyzes the > preferred locations of input partitions. It starts by trying to create one > partition for each unique location in the input. However, if the the > requested number of coalesced partitions is higher that the number of unique > locations, it has to pick duplicate locations. > Currently, the duplicate locations are picked by iterating over the input > partitions in order, and copying their preferred locations to coalesced > partitions. If the input partitions are clustered by location, this can > result in severe skew. > Instead of iterating over the list of input partitions in order, we should > pick them at random. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23599) The UUID() expression is too non-deterministic
Herman van Hovell created SPARK-23599: - Summary: The UUID() expression is too non-deterministic Key: SPARK-23599 URL: https://issues.apache.org/jira/browse/SPARK-23599 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell The current {{Uuid()}} expression uses {{java.util.UUID.randomUUID}} for UUID generation. There are a couple of major problems with this: - It is non-deterministic across task retries. This breaks Spark's processing model, and this will to very hard to trace bugs, like non-deterministic shuffles, duplicates and missing rows. - It uses a single secure random for UUID generation. This uses a single JVM wide lock, and this can lead to lock contention and other performance problems. We should move to something that is deterministic between retries. This can be done by using seeded PRNGs for which we set the seed during planning. It is important here to use a PRNG that provides enough entropy for creating a proper UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23599) The UUID() expression is too non-deterministic
[ https://issues.apache.org/jira/browse/SPARK-23599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23599: -- Priority: Critical (was: Major) > The UUID() expression is too non-deterministic > -- > > Key: SPARK-23599 > URL: https://issues.apache.org/jira/browse/SPARK-23599 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Critical > > The current {{Uuid()}} expression uses {{java.util.UUID.randomUUID}} for UUID > generation. There are a couple of major problems with this: > - It is non-deterministic across task retries. This breaks Spark's processing > model, and this will to very hard to trace bugs, like non-deterministic > shuffles, duplicates and missing rows. > - It uses a single secure random for UUID generation. This uses a single JVM > wide lock, and this can lead to lock contention and other performance > problems. > We should move to something that is deterministic between retries. This can > be done by using seeded PRNGs for which we set the seed during planning. It > is important here to use a PRNG that provides enough entropy for creating a > proper UUID. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] In the generated function body we call the _append_ method.| Now, the _addNewFunction_ method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the _IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator |https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]In the generated function body we call the _append_ method. Now, the _addNewFunction_ method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the _IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460]In the generated function body we call the _append_ method.| Now, the _addNewFunction_ method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the _IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown
[jira] [Assigned] (SPARK-23586) Add interpreted execution for WrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23586: Assignee: Apache Spark > Add interpreted execution for WrapOption expression > --- > > Key: SPARK-23586 > URL: https://issues.apache.org/jira/browse/SPARK-23586 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23586) Add interpreted execution for WrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23586: Assignee: (was: Apache Spark) > Add interpreted execution for WrapOption expression > --- > > Key: SPARK-23586 > URL: https://issues.apache.org/jira/browse/SPARK-23586 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23586) Add interpreted execution for WrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386069#comment-16386069 ] Apache Spark commented on SPARK-23586: -- User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/20736 > Add interpreted execution for WrapOption expression > --- > > Key: SPARK-23586 > URL: https://issues.apache.org/jira/browse/SPARK-23586 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] Now, this method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the _IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking the _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Genera
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] Now, this method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the ___IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$Gener
[jira] [Updated] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
[ https://issues.apache.org/jira/browse/SPARK-23598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Vogelbacher updated SPARK-23598: -- Description: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {noformat} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345){noformat} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] __ from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] Now, this method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the ___IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. was: Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {code:java} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$G
[jira] [Created] (SPARK-23598) WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec
David Vogelbacher created SPARK-23598: - Summary: WholeStageCodegen can lead to IllegalAccessError calling append for HashAggregateExec Key: SPARK-23598 URL: https://issues.apache.org/jira/browse/SPARK-23598 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: David Vogelbacher Got the following stacktrace for a large QueryPlan using WholeStageCodeGen: {code:java} java.lang.IllegalAccessError: tried to access method org.apache.spark.sql.execution.BufferedRowIterator.append(Lorg/apache/spark/sql/catalyst/InternalRow;)V from class org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7$agg_NestedClass.agg_doAggregateWithKeysOutput$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage7.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) {code} After disabling codegen, everything works. The root cause seems to be that we are trying to call the protected _append_ method of [BufferedRowIterator|https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/BufferedRowIterator.java#L68] __ from an inner-class of a sub-class that is loaded by a different class-loader (after codegen compilation). [https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.4.4] states that a protected method _R_ can be accessed only if one of the following two conditions is fulfilled: # R is protected and is declared in a class C, and D is either a subclass of C or C itself. Furthermore, if R is not static, then the symbolic reference to R must contain a symbolic reference to a class T, such that T is either a subclass of D, a superclass of D, or D itself. # R is either protected or has default access (that is, neither public nor protected nor private), and is declared by a class in the same run-time package as D. 2.) doesn't apply as we have loaded the class with a different class loader (and are in a different package) and 1.) doesn't apply because we are apparently trying to call the method from an inner class of a subclass of _BufferedRowIterator_. Looking at the Code path of _WholeStageCodeGen_, the following happens: # In [WholeStageCodeGen|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala#L527], we create the subclass of _BufferedRowIterator_, along with a _processNext_ method for processing the output of the child plan. # In the child, which is a [HashAggregateExec|https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L517], we create the method which shows up at the top of the stack trace (called _doAggregateWithKeysOutput_ ) # We add this method to the compiled code invoking _addNewFunction_ of [CodeGenerator|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L460] Now, this method states that: {noformat} If the code for the `OuterClass` grows too large, the function will be inlined into a new private, inner class {noformat} This indeed seems to happen: the _doAggregateWithKeysOutput_ method is put into a new private inner class. Thus, it doesn't have access to the protected _append_ method anymore but still tries to call it, which results in the ___IllegalAccessError._ Possible fixes: * Pass in the _inlineToOuterClass_ flag when invoking __ _addNewFunction_ * Make the _append_ method public * Re-declare the _append_ method in the generated subclass (just invoking _super_). This way, inner classes should have access to it. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23597) Audit Spark SQL code base for non-interpreted expressions
Herman van Hovell created SPARK-23597: - Summary: Audit Spark SQL code base for non-interpreted expressions Key: SPARK-23597 URL: https://issues.apache.org/jira/browse/SPARK-23597 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell We want to eliminate expressions that do not provide an interpreted execution path from the code base. The goal of this ticket is to check if there any other besides the ones being addressed by SPARK-23580. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23596) Modify Dataset test harness to include interpreted execution
Herman van Hovell created SPARK-23596: - Summary: Modify Dataset test harness to include interpreted execution Key: SPARK-23596 URL: https://issues.apache.org/jira/browse/SPARK-23596 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell We should modify the Dataset test harness to also test the interpreted code paths. This task can be started as soon as a significant subset of the object related Expressions provides an interpreted fallback. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23595) Add interpreted execution for ValidateExternalType expression
Herman van Hovell created SPARK-23595: - Summary: Add interpreted execution for ValidateExternalType expression Key: SPARK-23595 URL: https://issues.apache.org/jira/browse/SPARK-23595 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23594) Add interpreted execution for GetExternalRowField expression
Herman van Hovell created SPARK-23594: - Summary: Add interpreted execution for GetExternalRowField expression Key: SPARK-23594 URL: https://issues.apache.org/jira/browse/SPARK-23594 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression
Herman van Hovell created SPARK-23593: - Summary: Add interpreted execution for InitializeJavaBean expression Key: SPARK-23593 URL: https://issues.apache.org/jira/browse/SPARK-23593 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23592) Add interpreted execution for DecodeUsingSerializer expression
Herman van Hovell created SPARK-23592: - Summary: Add interpreted execution for DecodeUsingSerializer expression Key: SPARK-23592 URL: https://issues.apache.org/jira/browse/SPARK-23592 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23591) Add interpreted execution for EncodeUsingSerializer expression
Herman van Hovell created SPARK-23591: - Summary: Add interpreted execution for EncodeUsingSerializer expression Key: SPARK-23591 URL: https://issues.apache.org/jira/browse/SPARK-23591 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23590) Add interpreted execution for CreateExternalRow
Herman van Hovell created SPARK-23590: - Summary: Add interpreted execution for CreateExternalRow Key: SPARK-23590 URL: https://issues.apache.org/jira/browse/SPARK-23590 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23590) Add interpreted execution for CreateExternalRow expression
[ https://issues.apache.org/jira/browse/SPARK-23590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23590: -- Summary: Add interpreted execution for CreateExternalRow expression (was: Add interpreted execution for CreateExternalRow) > Add interpreted execution for CreateExternalRow expression > -- > > Key: SPARK-23590 > URL: https://issues.apache.org/jira/browse/SPARK-23590 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23589) Add interpreted execution for ExternalMapToCatalyst expression
Herman van Hovell created SPARK-23589: - Summary: Add interpreted execution for ExternalMapToCatalyst expression Key: SPARK-23589 URL: https://issues.apache.org/jira/browse/SPARK-23589 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23588) Add interpreted execution for CatalystToExternalMap expression
Herman van Hovell created SPARK-23588: - Summary: Add interpreted execution for CatalystToExternalMap expression Key: SPARK-23588 URL: https://issues.apache.org/jira/browse/SPARK-23588 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23587) Add interpreted execution for MapObjects expression
[ https://issues.apache.org/jira/browse/SPARK-23587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23587: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-23580 > Add interpreted execution for MapObjects expression > --- > > Key: SPARK-23587 > URL: https://issues.apache.org/jira/browse/SPARK-23587 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > > Add interpreted execution for {{MapObjects}} expression. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23587) Add interpreted execution for MapObjects expression
Herman van Hovell created SPARK-23587: - Summary: Add interpreted execution for MapObjects expression Key: SPARK-23587 URL: https://issues.apache.org/jira/browse/SPARK-23587 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell Add interpreted execution for {{MapObjects}} expression. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23586) Add interpreted execution for WrapOption expression
Herman van Hovell created SPARK-23586: - Summary: Add interpreted execution for WrapOption expression Key: SPARK-23586 URL: https://issues.apache.org/jira/browse/SPARK-23586 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23585) Add interpreted execution for UnwrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23585: -- Summary: Add interpreted execution for UnwrapOption expression (was: Add interpreted execition for UnwrapOption expression) > Add interpreted execution for UnwrapOption expression > - > > Key: SPARK-23585 > URL: https://issues.apache.org/jira/browse/SPARK-23585 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23582) Add interpreted execution to StaticInvoke expression
[ https://issues.apache.org/jira/browse/SPARK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23582: -- Summary: Add interpreted execution to StaticInvoke expression (was: Add interpreted execution to StaticInvoke) > Add interpreted execution to StaticInvoke expression > > > Key: SPARK-23582 > URL: https://issues.apache.org/jira/browse/SPARK-23582 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23585) Add interpreted execition for UnwrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23585: -- Summary: Add interpreted execition for UnwrapOption expression (was: Add interpreted mode for UnwrapOption expression) > Add interpreted execition for UnwrapOption expression > - > > Key: SPARK-23585 > URL: https://issues.apache.org/jira/browse/SPARK-23585 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23584) Add interpreted execution to NewInstance expression
[ https://issues.apache.org/jira/browse/SPARK-23584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23584: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-23580 > Add interpreted execution to NewInstance expression > --- > > Key: SPARK-23584 > URL: https://issues.apache.org/jira/browse/SPARK-23584 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23585) Add interpreted mode for UnwrapOption expression
[ https://issues.apache.org/jira/browse/SPARK-23585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23585: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-23580 > Add interpreted mode for UnwrapOption expression > > > Key: SPARK-23585 > URL: https://issues.apache.org/jira/browse/SPARK-23585 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Herman van Hovell >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23584) Add interpreted execution to NewInstance expression
Herman van Hovell created SPARK-23584: - Summary: Add interpreted execution to NewInstance expression Key: SPARK-23584 URL: https://issues.apache.org/jira/browse/SPARK-23584 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23585) Add interpreted mode for UnwrapOption expression
Herman van Hovell created SPARK-23585: - Summary: Add interpreted mode for UnwrapOption expression Key: SPARK-23585 URL: https://issues.apache.org/jira/browse/SPARK-23585 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23582) Add interpreted execution to StaticInvoke
Herman van Hovell created SPARK-23582: - Summary: Add interpreted execution to StaticInvoke Key: SPARK-23582 URL: https://issues.apache.org/jira/browse/SPARK-23582 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23583) Add interpreted execution to Invoke expression
Herman van Hovell created SPARK-23583: - Summary: Add interpreted execution to Invoke expression Key: SPARK-23583 URL: https://issues.apache.org/jira/browse/SPARK-23583 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23581) Add an interpreted version of GenerateUnsafeProjection
Herman van Hovell created SPARK-23581: - Summary: Add an interpreted version of GenerateUnsafeProjection Key: SPARK-23581 URL: https://issues.apache.org/jira/browse/SPARK-23581 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell GenerateUnsafeProjection should have an interpreted cousin. See the parent ticket for the motivation. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23580) Interpreted mode fallback should be implemented for all expressions & projections
Herman van Hovell created SPARK-23580: - Summary: Interpreted mode fallback should be implemented for all expressions & projections Key: SPARK-23580 URL: https://issues.apache.org/jira/browse/SPARK-23580 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Herman van Hovell Spark SQL currently does not support interpreted mode for all expressions and projections. This is a problem for scenario's where were code generation does not work, or blows past the JVM class limits. We currently cannot gracefully fallback. This ticket is an umbrella to fix this class of problem in Spark SQL. This work can be divided into two main area's: - Add interpreted versions for all dataset related expressions. - Add an interpreted version of {{GenerateUnsafeProjection}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23510) Support read data from Hive 2.2 and Hive 2.3 metastore
[ https://issues.apache.org/jira/browse/SPARK-23510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16386010#comment-16386010 ] Apache Spark commented on SPARK-23510: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/20734 > Support read data from Hive 2.2 and Hive 2.3 metastore > -- > > Key: SPARK-23510 > URL: https://issues.apache.org/jira/browse/SPARK-23510 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23546) Refactor non-stateful methods/values in CodegenContext
[ https://issues.apache.org/jira/browse/SPARK-23546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-23546. --- Resolution: Fixed Assignee: Kazuaki Ishizaki Target Version/s: 2.3.0 > Refactor non-stateful methods/values in CodegenContext > -- > > Key: SPARK-23546 > URL: https://issues.apache.org/jira/browse/SPARK-23546 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki >Priority: Major > > Current {{CodegenContext}} class has immutable value or method without > mutable state, too. > This refactoring moves them to {{CodeGenerator}} object class which can be > accessed from anywhere without an instantiated {{CodegenContext}} in the > program. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23516) I think it is unnecessary to transfer unroll memory to storage memory
[ https://issues.apache.org/jira/browse/SPARK-23516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liuxian resolved SPARK-23516. - Resolution: Invalid > I think it is unnecessary to transfer unroll memory to storage memory > -- > > Key: SPARK-23516 > URL: https://issues.apache.org/jira/browse/SPARK-23516 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: liuxian >Priority: Minor > > In fact, unroll memory is also storage memory,so I think it is unnecessary to > release unroll memory really, and then to get storage memory again. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org