[jira] [Created] (SPARK-22246) UnsafeRow, UnsafeArrayData, and UnsafeMapData use MemoryBlock
Kazuaki Ishizaki created SPARK-22246: Summary: UnsafeRow, UnsafeArrayData, and UnsafeMapData use MemoryBlock Key: SPARK-22246 URL: https://issues.apache.org/jira/browse/SPARK-22246 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Kazuaki Ishizaki To use {{MemoryBlock}} can improve flexibility of choosing memory type and runtime performance for memory accesses with {{Unsafe}}. This JIRA entry tries to use {{MemoryBlock}} in {{UnsafeRow}}, {{UnsafeArrayData}}, and {{UnsafeMapData}} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22226) splitExpression can create too many method calls (generating a Constant Pool limit error)
[ https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197272#comment-16197272 ] Kazuaki Ishizaki commented on SPARK-6: -- You are right. [This PR|https://github.com/apache/spark/pull/16648] will not solve the issue regarding a lot of splited methods. I missed the discussion we did [here|https://github.com/apache/spark/pull/19447]. > splitExpression can create too many method calls (generating a Constant Pool > limit error) > - > > Key: SPARK-6 > URL: https://issues.apache.org/jira/browse/SPARK-6 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marco Gaido > > Code generation for very wide datasets can fail because of the Constant Pool > limit reached. > This can be caused by many reasons. One of them is that we are currently > splitting the definition of the generated methods among several > {{NestedClass}} but all these methods are called in the main class. Since we > have entries added to the constant pool for each method invocation, this is > limiting the number of rows and is leading for very wide dataset to: > {noformat} > org.codehaus.janino.JaninoRuntimeException: Constant pool for class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection > has grown past JVM limit of 0x > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22226) Code generation fails for dataframes with 10000 columns
[ https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197164#comment-16197164 ] Kazuaki Ishizaki commented on SPARK-6: -- [This PR|https://github.com/apache/spark/pull/16648] addresses such an issue. > Code generation fails for dataframes with 1 columns > --- > > Key: SPARK-6 > URL: https://issues.apache.org/jira/browse/SPARK-6 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Marco Gaido > > Code generation for very wide datasets can fail because of the Constant Pool > limit reached. > This can be caused by many reasons. One of them is that we are currently > splitting the definition of the generated methods among several > {{NestedClass}} but all these methods are called in the main class. Since we > have entries added to the constant pool for each method invocation, this is > limiting the number of rows and is leading for very wide dataset to: > {noformat} > org.codehaus.janino.JaninoRuntimeException: Constant pool for class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection > has grown past JVM limit of 0x > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22219) Refector "spark.sql.codegen.comments"
Kazuaki Ishizaki created SPARK-22219: Summary: Refector "spark.sql.codegen.comments" Key: SPARK-22219 URL: https://issues.apache.org/jira/browse/SPARK-22219 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Kazuaki Ishizaki Priority: Minor To get a value for {{"spark.sql.codegen.comments"}} is not the latest approach. This refactoring is to use better approach to get a value for {{"spark.sql.codegen.comments"}} . -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194937#comment-16194937 ] Kazuaki Ishizaki commented on SPARK-19984: -- [~JohnSteidley] Here is my program that produced the above plan. Any comments are appreciated to reproduce the problem. {code} test("SPARK-19984") { withSQLConf( "spark.sql.shuffle.partitions" -> "1", "spark.sql.join.preferSortMergeJoin" -> "true", "spark.sql.autoBroadcastJoinThreshold" -> "-1") { withTempPath { dir => val t = sparkContext.parallelize(Seq("data1", "data2", "data3")).toDF("id") t.write.parquet(dir.getCanonicalPath) val df3 = sqlContext.read.parquet(dir.getCanonicalPath) val dfA = df3.limit(3) val dfB = dfA.select(col("id").alias("A"), col("id").alias("B")) val dfC = df3.select(col("id").alias("A")) val df1 = dfC.join(dfB, "A") val df = df1.groupBy().count() df.explain(true) df.collect } } } {code} > ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-19984 > URL: https://issues.apache.org/jira/browse/SPARK-19984 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0 >Reporter: Andrey Yakovenko > Attachments: after_adding_count.txt, before_adding_count.txt > > > I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. > This is not permanent error, next time i run it could disappear. > Unfortunately i don't know how to reproduce the issue. As you can see from > the log my logic is pretty complicated. > Here is a part of log i've got (container_1489514660953_0015_01_01) > {code} > 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 151, Column 29: A method named "compare" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 001 */ public Object generate(Object[] references) { > /* 002 */ return new GeneratedIterator(references); > /* 003 */ } > /* 004 */ > /* 005 */ final class GeneratedIterator extends > org.apache.spark.sql.execution.BufferedRowIterator { > /* 006 */ private Object[] references; > /* 007 */ private scala.collection.Iterator[] inputs; > /* 008 */ private boolean agg_initAgg; > /* 009 */ private boolean agg_bufIsNull; > /* 010 */ private long agg_bufValue; > /* 011 */ private boolean agg_initAgg1; > /* 012 */ private boolean agg_bufIsNull1; > /* 013 */ private long agg_bufValue1; > /* 014 */ private scala.collection.Iterator smj_leftInput; > /* 015 */ private scala.collection.Iterator smj_rightInput; > /* 016 */ private InternalRow smj_leftRow; > /* 017 */ private InternalRow smj_rightRow; > /* 018 */ private UTF8String smj_value2; > /* 019 */ private java.util.ArrayList smj_matches; > /* 020 */ private UTF8String smj_value3; > /* 021 */ private UTF8String smj_value4; > /* 022 */ private org.apache.spark.sql.execution.metric.SQLMetric > smj_numOutputRows; > /* 023 */ private UnsafeRow smj_result; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > smj_rowWriter; > /* 026 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows; > /* 027 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime; > /* 028 */ private UnsafeRow agg_result; > /* 029 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder; > /* 030 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter; > /* 031 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows1; > /* 032 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime1; > /* 033 */ private UnsafeRow agg_result1; > /* 034 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder1; > /* 035 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter1; > /* 036 */ > /* 037 */ public GeneratedIterator(Object[] references) { > /* 038 */ this.references = references; > /* 039 */ } > /* 040 */ > /* 041 */ public void init(int index, scala.collection.Iterator[] inputs) { > /* 042 */ partitionIndex = index; > /* 043 */ this.inputs = inputs; > /* 044 */ wholestagecodegen_init_0(); > /* 045 */ wholestagecodegen_init_1(); > /* 046 */ > /* 047 */ } > /* 04
[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193201#comment-16193201 ] Kazuaki Ishizaki commented on SPARK-19984: -- [~JohnSteidley] Thank you for your comment. Now, I updated my program. Then, I got the same physical plan as what you provided. However, my program works on Spark 2.1.2 RC and master branch. While the physical plan tried to {{SortMergeJoin}} two strings, the generated code seems to {{SortMergeJoin}} string and long value for {{count(1)}}. That strange {{SortMergeJoin}} is what I cannot understand. {code} == Physical Plan == *HashAggregate(keys=[], functions=[count(1)], output=[count#35L]) +- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#40L]) +- *Project +- *SortMergeJoin [A#25], [A#20], Inner :- *Sort [A#25 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(A#25, 1) : +- *Project [id#16 AS A#25] :+- *Filter isnotnull(id#16) : +- *FileScan parquet [id#16] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:..., PartitionFilters: [], PushedFilters: [IsNotNull(id)], ReadSchema: struct +- *Sort [A#20 ASC NULLS FIRST], false, 0 +- *Project [id#16 AS A#20] +- *Filter isnotnull(id#16) +- *GlobalLimit 3 +- Exchange SinglePartition +- *LocalLimit 3 +- *FileScan parquet [id#16] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:..., PartitionFilters: [], PushedFilters: [], ReadSchema: struct {code} > ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-19984 > URL: https://issues.apache.org/jira/browse/SPARK-19984 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0 >Reporter: Andrey Yakovenko > Attachments: after_adding_count.txt, before_adding_count.txt > > > I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. > This is not permanent error, next time i run it could disappear. > Unfortunately i don't know how to reproduce the issue. As you can see from > the log my logic is pretty complicated. > Here is a part of log i've got (container_1489514660953_0015_01_01) > {code} > 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 151, Column 29: A method named "compare" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 001 */ public Object generate(Object[] references) { > /* 002 */ return new GeneratedIterator(references); > /* 003 */ } > /* 004 */ > /* 005 */ final class GeneratedIterator extends > org.apache.spark.sql.execution.BufferedRowIterator { > /* 006 */ private Object[] references; > /* 007 */ private scala.collection.Iterator[] inputs; > /* 008 */ private boolean agg_initAgg; > /* 009 */ private boolean agg_bufIsNull; > /* 010 */ private long agg_bufValue; > /* 011 */ private boolean agg_initAgg1; > /* 012 */ private boolean agg_bufIsNull1; > /* 013 */ private long agg_bufValue1; > /* 014 */ private scala.collection.Iterator smj_leftInput; > /* 015 */ private scala.collection.Iterator smj_rightInput; > /* 016 */ private InternalRow smj_leftRow; > /* 017 */ private InternalRow smj_rightRow; > /* 018 */ private UTF8String smj_value2; > /* 019 */ private java.util.ArrayList smj_matches; > /* 020 */ private UTF8String smj_value3; > /* 021 */ private UTF8String smj_value4; > /* 022 */ private org.apache.spark.sql.execution.metric.SQLMetric > smj_numOutputRows; > /* 023 */ private UnsafeRow smj_result; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > smj_rowWriter; > /* 026 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows; > /* 027 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime; > /* 028 */ private UnsafeRow agg_result; > /* 029 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder; > /* 030 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter; > /* 031 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows1; > /* 032 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime1; > /* 033 */ private UnsafeRow agg_result1; > /* 034 */ private
[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190050#comment-16190050 ] Kazuaki Ishizaki commented on SPARK-19984: -- [~JohnSteidley] Thank you for providing valuable information. I succeeded a small program derived from your example that has the following physical plan. I think that this physical plan is almost the same as you provided. Only the difference looks whether parquet is used or not. However, I cannot reproduce this program using the small program using Spark 2.1 or master branch. Tomorrow, I will try to use parquet in the small program. {code} == Physical Plan == *HashAggregate(keys=[], functions=[count(A#52)], output=[count(A)#63L]) +- *HashAggregate(keys=[], functions=[partial_count(A#52)], output=[count#68L]) +- *Project [A#52] +- *SortMergeJoin [A#52], [A#43], Inner :- *Sort [A#52 ASC NULLS FIRST], false, 0 : +- Exchange hashpartitioning(A#52, 1) : +- *Project [value#50 AS A#52] :+- *Filter isnotnull(value#50) : +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#50] : +- Scan ExternalRDDScan[obj#49] +- *Sort [A#43 ASC NULLS FIRST], false, 0 +- *Project [id#39 AS A#43] +- *Filter isnotnull(id#39) +- *GlobalLimit 2 +- Exchange SinglePartition +- *LocalLimit 2 +- *Project [value#37 AS id#39] +- *SerializeFromObject [staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, java.lang.String, true], true) AS value#37] +- Scan ExternalRDDScan[obj#36] {code} > ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-19984 > URL: https://issues.apache.org/jira/browse/SPARK-19984 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0 >Reporter: Andrey Yakovenko > Attachments: after_adding_count.txt, before_adding_count.txt > > > I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. > This is not permanent error, next time i run it could disappear. > Unfortunately i don't know how to reproduce the issue. As you can see from > the log my logic is pretty complicated. > Here is a part of log i've got (container_1489514660953_0015_01_01) > {code} > 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 151, Column 29: A method named "compare" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 001 */ public Object generate(Object[] references) { > /* 002 */ return new GeneratedIterator(references); > /* 003 */ } > /* 004 */ > /* 005 */ final class GeneratedIterator extends > org.apache.spark.sql.execution.BufferedRowIterator { > /* 006 */ private Object[] references; > /* 007 */ private scala.collection.Iterator[] inputs; > /* 008 */ private boolean agg_initAgg; > /* 009 */ private boolean agg_bufIsNull; > /* 010 */ private long agg_bufValue; > /* 011 */ private boolean agg_initAgg1; > /* 012 */ private boolean agg_bufIsNull1; > /* 013 */ private long agg_bufValue1; > /* 014 */ private scala.collection.Iterator smj_leftInput; > /* 015 */ private scala.collection.Iterator smj_rightInput; > /* 016 */ private InternalRow smj_leftRow; > /* 017 */ private InternalRow smj_rightRow; > /* 018 */ private UTF8String smj_value2; > /* 019 */ private java.util.ArrayList smj_matches; > /* 020 */ private UTF8String smj_value3; > /* 021 */ private UTF8String smj_value4; > /* 022 */ private org.apache.spark.sql.execution.metric.SQLMetric > smj_numOutputRows; > /* 023 */ private UnsafeRow smj_result; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > smj_rowWriter; > /* 026 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows; > /* 027 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime; > /* 028 */ private UnsafeRow agg_result; > /* 029 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder; > /* 030 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter; > /* 031 */
[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188320#comment-16188320 ] Kazuaki Ishizaki commented on SPARK-19984: -- [~JohnSteidley] Thank you for your report. I also confirmed your code stuff cannot reproduce this issue. It is still hard to fix this issue. Can you put the result of {{df.explain(true)}} of the result of {{join}} if you cannot share the whole codebase? > ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java' > - > > Key: SPARK-19984 > URL: https://issues.apache.org/jira/browse/SPARK-19984 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 2.1.0 >Reporter: Andrey Yakovenko > > I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. > This is not permanent error, next time i run it could disappear. > Unfortunately i don't know how to reproduce the issue. As you can see from > the log my logic is pretty complicated. > Here is a part of log i've got (container_1489514660953_0015_01_01) > {code} > 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 151, Column 29: A method named "compare" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 001 */ public Object generate(Object[] references) { > /* 002 */ return new GeneratedIterator(references); > /* 003 */ } > /* 004 */ > /* 005 */ final class GeneratedIterator extends > org.apache.spark.sql.execution.BufferedRowIterator { > /* 006 */ private Object[] references; > /* 007 */ private scala.collection.Iterator[] inputs; > /* 008 */ private boolean agg_initAgg; > /* 009 */ private boolean agg_bufIsNull; > /* 010 */ private long agg_bufValue; > /* 011 */ private boolean agg_initAgg1; > /* 012 */ private boolean agg_bufIsNull1; > /* 013 */ private long agg_bufValue1; > /* 014 */ private scala.collection.Iterator smj_leftInput; > /* 015 */ private scala.collection.Iterator smj_rightInput; > /* 016 */ private InternalRow smj_leftRow; > /* 017 */ private InternalRow smj_rightRow; > /* 018 */ private UTF8String smj_value2; > /* 019 */ private java.util.ArrayList smj_matches; > /* 020 */ private UTF8String smj_value3; > /* 021 */ private UTF8String smj_value4; > /* 022 */ private org.apache.spark.sql.execution.metric.SQLMetric > smj_numOutputRows; > /* 023 */ private UnsafeRow smj_result; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > smj_rowWriter; > /* 026 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows; > /* 027 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime; > /* 028 */ private UnsafeRow agg_result; > /* 029 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder; > /* 030 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter; > /* 031 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_numOutputRows1; > /* 032 */ private org.apache.spark.sql.execution.metric.SQLMetric > agg_aggTime1; > /* 033 */ private UnsafeRow agg_result1; > /* 034 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder1; > /* 035 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter > agg_rowWriter1; > /* 036 */ > /* 037 */ public GeneratedIterator(Object[] references) { > /* 038 */ this.references = references; > /* 039 */ } > /* 040 */ > /* 041 */ public void init(int index, scala.collection.Iterator[] inputs) { > /* 042 */ partitionIndex = index; > /* 043 */ this.inputs = inputs; > /* 044 */ wholestagecodegen_init_0(); > /* 045 */ wholestagecodegen_init_1(); > /* 046 */ > /* 047 */ } > /* 048 */ > /* 049 */ private void wholestagecodegen_init_0() { > /* 050 */ agg_initAgg = false; > /* 051 */ > /* 052 */ agg_initAgg1 = false; > /* 053 */ > /* 054 */ smj_leftInput = inputs[0]; > /* 055 */ smj_rightInput = inputs[1]; > /* 056 */ > /* 057 */ smj_rightRow = null; > /* 058 */ > /* 059 */ smj_matches = new java.util.ArrayList(); > /* 060 */ > /* 061 */ this.smj_numOutputRows = > (org.apache.spark.sql.execution.metric.SQLMetric) references[0]; > /* 062 */ smj_result = new UnsafeRow(2); > /* 063 */ this.smj_holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(smj_result, > 64); > /* 064 */ th
[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184206#comment-16184206 ] Kazuaki Ishizaki commented on SPARK-18016: -- Thank you for reporting this again. While I pinged the original author in [this PR|https://github.com/apache/spark/pull/16648], it will not happen yet. > Code Generation: Constant Pool Past Limit for Wide/Nested Dataset > - > > Key: SPARK-18016 > URL: https://issues.apache.org/jira/browse/SPARK-18016 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Aleksander Eskilson >Assignee: Aleksander Eskilson > Fix For: 2.3.0 > > > When attempting to encode collections of large Java objects to Datasets > having very wide or deeply nested schemas, code generation can fail, yielding: > {code} > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for > class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection > has grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499) > at > org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439) > at > org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358) > at > org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894) > at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420) > at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$AbstractPackageMemberClassDeclaration.accept(Java.java:1309) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369) > at org.codehaus
[jira] [Updated] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.
[ https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-22130: - Issue Type: Improvement (was: Bug) > UTF8String.trim() inefficiently scans all white-space string twice. > --- > > Key: SPARK-22130 > URL: https://issues.apache.org/jira/browse/SPARK-22130 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > {{UTF8String.trim()}} scans a string including only white space (e.g. {{" > "}}) twice inefficiently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181190#comment-16181190 ] Kazuaki Ishizaki commented on SPARK-16845: -- [~mvelusce] Thank you for reporting an issue with repro. I can reproduce this. If I am correct, Spark 2.2 can fall back into a path disabling code gen by [this PR|https://github.com/apache/spark/pull/17087]. Once we tried to backport this to Spark 2.1, it was rejected. > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > - > > Key: SPARK-16845 > URL: https://issues.apache.org/jira/browse/SPARK-16845 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: hejie >Assignee: Liwei Lin > Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0 > > Attachments: error.txt.zip > > > I have a wide table(400 columns), when I try fitting the traindata on all > columns, the fatal error occurs. > ... 46 more > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941) > at org.codehaus.janino.CodeContext.write(CodeContext.java:854) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.
[ https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181070#comment-16181070 ] Kazuaki Ishizaki commented on SPARK-22130: -- I will submit a PR soon. > UTF8String.trim() inefficiently scans all white-space string twice. > --- > > Key: SPARK-22130 > URL: https://issues.apache.org/jira/browse/SPARK-22130 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > {{UTF8String.trim()}} scans a string including only white space (e.g. {{" > "}}) twice inefficiently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.
Kazuaki Ishizaki created SPARK-22130: Summary: UTF8String.trim() inefficiently scans all white-space string twice. Key: SPARK-22130 URL: https://issues.apache.org/jira/browse/SPARK-22130 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Kazuaki Ishizaki Priority: Minor {{UTF8String.trim()}} scans a string including only white space (e.g. {{" "}}) twice inefficiently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen
[ https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176655#comment-16176655 ] Kazuaki Ishizaki edited comment on SPARK-22105 at 9/22/17 4:22 PM: --- Can these PRs at https://issues.apache.org/jira/browse/SPARK-21870 and https://issues.apache.org/jira/browse/SPARK-21871 alleviate this issue? was (Author: kiszk): Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this issue? > Dataframe has poor performance when computing on many columns with codegen > -- > > Key: SPARK-22105 > URL: https://issues.apache.org/jira/browse/SPARK-22105 > Project: Spark > Issue Type: Improvement > Components: ML, SQL >Affects Versions: 2.3.0 >Reporter: Weichen Xu >Priority: Minor > > Suppose we have a dataframe with many columns (e.g 100 columns), each column > is DoubleType. > And we need to compute avg on each column. We will find using dataframe avg > will be much slower than using RDD.aggregate. > I observe this issue from this PR: (One pass imputer) > https://github.com/apache/spark/pull/18902 > I also write a minimal testing code to reproduce this issue, I use computing > sum to reproduce this issue: > https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1 > When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be > about 3x slower than `RDD.aggregate`, but if we only compute one column, > dataframe avg will be much faster than `RDD.aggregate`. > The reason of this issue, should be the defact in dataframe codegen. Codegen > will inline everything and generate large code block. When the column number > is large (e.g 100 columns), the codegen size will be too large, which cause > jvm failed to JIT and fall back to byte code interpretation. > This PR should address this issue: > https://github.com/apache/spark/pull/19082 > But we need more performance test against some code in ML after above PR > merged, to check whether this issue is actually fixed. > This JIRA used to track this performance issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen
[ https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176655#comment-16176655 ] Kazuaki Ishizaki commented on SPARK-22105: -- Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this issue? > Dataframe has poor performance when computing on many columns with codegen > -- > > Key: SPARK-22105 > URL: https://issues.apache.org/jira/browse/SPARK-22105 > Project: Spark > Issue Type: Improvement > Components: ML, SQL >Affects Versions: 2.3.0 >Reporter: Weichen Xu >Priority: Minor > > Suppose we have a dataframe with many columns (e.g 100 columns), each column > is DoubleType. > And we need to compute avg on each column. We will find using dataframe avg > will be much slower than using RDD.aggregate. > I observe this issue from this PR: (One pass imputer) > https://github.com/apache/spark/pull/18902 > I also write a minimal testing code to reproduce this issue, I use computing > sum to reproduce this issue: > https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1 > When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be > about 3x slower than `RDD.aggregate`, but if we only compute one column, > dataframe avg will be much faster than `RDD.aggregate`. > The reason of this issue, should be the defact in dataframe codegen. Codegen > will inline everything and generate large code block. When the column number > is large (e.g 100 columns), the codegen size will be too large, which cause > jvm failed to JIT and fall back to byte code interpretation. > This PR should address this issue: > https://github.com/apache/spark/pull/19082 > But we need more performance code against some code in ML after above PR > merged, to check whether this issue is actually fixed. > This JIRA used to track this performance issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared
[ https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170319#comment-16170319 ] Kazuaki Ishizaki commented on SPARK-22000: -- If there is no sample code, it may take a long time to fix this. Is it possible to attach all code or to put code to create all of Dataset or DataFrame? > org.codehaus.commons.compiler.CompileException: toString method is not > declared > --- > > Key: SPARK-22000 > URL: https://issues.apache.org/jira/browse/SPARK-22000 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: taiho choi > > the error message say that toString is not declared on "value13" which is > "long" type in generated code. > i think value13 should be Long type. > ==error message > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 70, Column 32: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 32: A method named "toString" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 033 */ private void apply1_2(InternalRow i) { > /* 034 */ > /* 035 */ > /* 036 */ boolean isNull11 = i.isNullAt(1); > /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1)); > /* 038 */ boolean isNull10 = true; > /* 039 */ java.lang.String value10 = null; > /* 040 */ if (!isNull11) { > /* 041 */ > /* 042 */ isNull10 = false; > /* 043 */ if (!isNull10) { > /* 044 */ > /* 045 */ Object funcResult4 = null; > /* 046 */ funcResult4 = value11.toString(); > /* 047 */ > /* 048 */ if (funcResult4 != null) { > /* 049 */ value10 = (java.lang.String) funcResult4; > /* 050 */ } else { > /* 051 */ isNull10 = true; > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ } > /* 056 */ } > /* 057 */ javaBean.setApp(value10); > /* 058 */ > /* 059 */ > /* 060 */ boolean isNull13 = i.isNullAt(12); > /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12)); > /* 062 */ boolean isNull12 = true; > /* 063 */ java.lang.String value12 = null; > /* 064 */ if (!isNull13) { > /* 065 */ > /* 066 */ isNull12 = false; > /* 067 */ if (!isNull12) { > /* 068 */ > /* 069 */ Object funcResult5 = null; > /* 070 */ funcResult5 = value13.toString(); > /* 071 */ > /* 072 */ if (funcResult5 != null) { > /* 073 */ value12 = (java.lang.String) funcResult5; > /* 074 */ } else { > /* 075 */ isNull12 = true; > /* 076 */ } > /* 077 */ > /* 078 */ > /* 079 */ } > /* 080 */ } > /* 081 */ javaBean.setReasonCode(value12); > /* 082 */ > /* 083 */ } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22033) BufferHolder size checks should account for the specific VM array size limitations
[ https://issues.apache.org/jira/browse/SPARK-22033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169318#comment-16169318 ] Kazuaki Ishizaki commented on SPARK-22033: -- I think {{ColumnVector}} and {{HashMapGrowthStrategy}} may have possibility of the similar issue. What do you think? > BufferHolder size checks should account for the specific VM array size > limitations > -- > > Key: SPARK-22033 > URL: https://issues.apache.org/jira/browse/SPARK-22033 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Vadim Semenov >Priority: Minor > > User may get the following OOM Error while running a job with heavy > aggregations > ``` > java.lang.OutOfMemoryError: Requested array size exceeds VM limit > at > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:235) > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:228) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:254) > at > org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:247) > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:88) > at > org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:33) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) > at > org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:167) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > ``` > The [`BufferHolder.grow` tries to create a byte array of `Integer.MAX_VALUE` > here](https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java#L72) > but the maximum size of an array depends on specifics of a VM. > The safest value seems to be `Integer.MAX_VALUE - 8` > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l229 > In my JVM: > ``` > java -version > openjdk version "1.8.0_141" > OpenJDK Runtime Environment (build 1.8.0_141-b16) > OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode) > ``` > the max is `new Array[Byte](Integer.MAX_VALUE - 2)` -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared
[ https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165902#comment-16165902 ] Kazuaki Ishizaki commented on SPARK-22000: -- Thank you for good suggestion. I will try to use {{String.valueOf}}. > org.codehaus.commons.compiler.CompileException: toString method is not > declared > --- > > Key: SPARK-22000 > URL: https://issues.apache.org/jira/browse/SPARK-22000 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: taiho choi > > the error message say that toString is not declared on "value13" which is > "long" type in generated code. > i think value13 should be Long type. > ==error message > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 70, Column 32: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 32: A method named "toString" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 033 */ private void apply1_2(InternalRow i) { > /* 034 */ > /* 035 */ > /* 036 */ boolean isNull11 = i.isNullAt(1); > /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1)); > /* 038 */ boolean isNull10 = true; > /* 039 */ java.lang.String value10 = null; > /* 040 */ if (!isNull11) { > /* 041 */ > /* 042 */ isNull10 = false; > /* 043 */ if (!isNull10) { > /* 044 */ > /* 045 */ Object funcResult4 = null; > /* 046 */ funcResult4 = value11.toString(); > /* 047 */ > /* 048 */ if (funcResult4 != null) { > /* 049 */ value10 = (java.lang.String) funcResult4; > /* 050 */ } else { > /* 051 */ isNull10 = true; > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ } > /* 056 */ } > /* 057 */ javaBean.setApp(value10); > /* 058 */ > /* 059 */ > /* 060 */ boolean isNull13 = i.isNullAt(12); > /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12)); > /* 062 */ boolean isNull12 = true; > /* 063 */ java.lang.String value12 = null; > /* 064 */ if (!isNull13) { > /* 065 */ > /* 066 */ isNull12 = false; > /* 067 */ if (!isNull12) { > /* 068 */ > /* 069 */ Object funcResult5 = null; > /* 070 */ funcResult5 = value13.toString(); > /* 071 */ > /* 072 */ if (funcResult5 != null) { > /* 073 */ value12 = (java.lang.String) funcResult5; > /* 074 */ } else { > /* 075 */ isNull12 = true; > /* 076 */ } > /* 077 */ > /* 078 */ > /* 079 */ } > /* 080 */ } > /* 081 */ javaBean.setReasonCode(value12); > /* 082 */ > /* 083 */ } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared
[ https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165883#comment-16165883 ] Kazuaki Ishizaki commented on SPARK-22000: -- It would be good to generate {{((Long)value13).toString()}} to reduce # of boxing/unboxing. Anyway, as @maropu pointed out, could you please put the query? Then, I will create a PR. > org.codehaus.commons.compiler.CompileException: toString method is not > declared > --- > > Key: SPARK-22000 > URL: https://issues.apache.org/jira/browse/SPARK-22000 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: taiho choi > > the error message say that toString is not declared on "value13" which is > "long" type in generated code. > i think value13 should be Long type. > ==error message > Caused by: org.codehaus.commons.compiler.CompileException: File > 'generated.java', Line 70, Column 32: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 70, Column 32: A method named "toString" is not declared in any enclosing > class nor any supertype, nor through a static import > /* 033 */ private void apply1_2(InternalRow i) { > /* 034 */ > /* 035 */ > /* 036 */ boolean isNull11 = i.isNullAt(1); > /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1)); > /* 038 */ boolean isNull10 = true; > /* 039 */ java.lang.String value10 = null; > /* 040 */ if (!isNull11) { > /* 041 */ > /* 042 */ isNull10 = false; > /* 043 */ if (!isNull10) { > /* 044 */ > /* 045 */ Object funcResult4 = null; > /* 046 */ funcResult4 = value11.toString(); > /* 047 */ > /* 048 */ if (funcResult4 != null) { > /* 049 */ value10 = (java.lang.String) funcResult4; > /* 050 */ } else { > /* 051 */ isNull10 = true; > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ } > /* 056 */ } > /* 057 */ javaBean.setApp(value10); > /* 058 */ > /* 059 */ > /* 060 */ boolean isNull13 = i.isNullAt(12); > /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12)); > /* 062 */ boolean isNull12 = true; > /* 063 */ java.lang.String value12 = null; > /* 064 */ if (!isNull13) { > /* 065 */ > /* 066 */ isNull12 = false; > /* 067 */ if (!isNull12) { > /* 068 */ > /* 069 */ Object funcResult5 = null; > /* 070 */ funcResult5 = value13.toString(); > /* 071 */ > /* 072 */ if (funcResult5 != null) { > /* 073 */ value12 = (java.lang.String) funcResult5; > /* 074 */ } else { > /* 075 */ isNull12 = true; > /* 076 */ } > /* 077 */ > /* 078 */ > /* 079 */ } > /* 080 */ } > /* 081 */ javaBean.setReasonCode(value12); > /* 082 */ > /* 083 */ } -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()
[ https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158995#comment-16158995 ] Kazuaki Ishizaki commented on SPARK-21907: -- If you cannot provide a repro, could you please run your program with the latest master branch? SPARK-21319 may alleviate this issue. > NullPointerException in UnsafeExternalSorter.spill() > > > Key: SPARK-21907 > URL: https://issues.apache.org/jira/browse/SPARK-21907 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Juliusz Sompolski > > I see NPE during sorting with the following stacktrace: > {code} > java.lang.NullPointerException > at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43) > at > org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270) > at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142) > at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685) > at > org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259) > at > org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346) > at > java.util
[jira] [Commented] (SPARK-21905) ClassCastException when call sqlContext.sql on temp table
[ https://issues.apache.org/jira/browse/SPARK-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158496#comment-16158496 ] Kazuaki Ishizaki commented on SPARK-21905: -- While I ran the following code (I do not have PointUDT and Point classes), I cannot see the exception using master branch or branch-2.2. {code} ... import org.apache.spark.sql.catalyst.encoders._ ... import org.apache.spark.sql.types._ test("SPARK-21905") { val schema = StructType(List( StructField("name", DataTypes.StringType, true), StructField("location", new ExamplePointUDT, true))) val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), 4) .map({ x: String => Row.fromSeq(Seq(x, new ExamplePoint(100, 100))) }) val dataFrame = sqlContext.createDataFrame(rowRdd, schema) dataFrame.createOrReplaceTempView("person") sqlContext.sql("SELECT * FROM person").foreach(println(_)) } {code} > ClassCastException when call sqlContext.sql on temp table > - > > Key: SPARK-21905 > URL: https://issues.apache.org/jira/browse/SPARK-21905 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: bluejoe > > {code:java} > val schema = StructType(List( > StructField("name", DataTypes.StringType, true), > StructField("location", new PointUDT, true))) > val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), > 4).map({ x: String ⇒ Row.fromSeq(Seq(x, Point(100, 100))) }); > val dataFrame = sqlContext.createDataFrame(rowRdd, schema) > dataFrame.createOrReplaceTempView("person"); > sqlContext.sql("SELECT * FROM person").foreach(println(_)); > {code} > the last statement throws exception: > {code:java} > Caused by: java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to > org.apache.spark.sql.catalyst.InternalRow > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr1$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287) > ... 18 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21946) Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
[ https://issues.apache.org/jira/browse/SPARK-21946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158044#comment-16158044 ] Kazuaki Ishizaki commented on SPARK-21946: -- If someone has not worked for this, I will create a PR. > Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table` > > > Key: SPARK-21946 > URL: https://issues.apache.org/jira/browse/SPARK-21946 > Project: Spark > Issue Type: Bug > Components: Tests >Affects Versions: 2.2.0 >Reporter: Dongjoon Hyun >Priority: Minor > > According to the [Apache Spark Jenkins > History|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql.execution.command/InMemoryCatalogedDDLSuite/alter_table__rename_cached_table/history/] > InMemoryCatalogedDDLSuite.`alter table: rename cached table` is very flaky. > We had better stablize this. > {code} > - alter table: rename cached table !!! CANCELED !!! > Array([2,2], [1,1]) did not equal Array([1,1], [2,2]) bad test: wrong data > (DDLSuite.scala:786) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()
[ https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156015#comment-16156015 ] Kazuaki Ishizaki commented on SPARK-21907: -- Thank you for your report. Could you please attach a program that can reproduce this issue? > NullPointerException in UnsafeExternalSorter.spill() > > > Key: SPARK-21907 > URL: https://issues.apache.org/jira/browse/SPARK-21907 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Juliusz Sompolski > > I see NPE during sorting with the following stacktrace: > {code} > java.lang.NullPointerException > at > org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43) > at > org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270) > at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142) > at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221) > at > org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203) > at > org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400) > at > org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at > org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778) > at > org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685) > at > org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259) > at > org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(T
[jira] [Commented] (SPARK-21894) Some Netty errors do not propagate to the top level driver
[ https://issues.apache.org/jira/browse/SPARK-21894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151930#comment-16151930 ] Kazuaki Ishizaki commented on SPARK-21894: -- Thank you for reporting this issue. Could you please attach a smaller program that can reproduce this problem? > Some Netty errors do not propagate to the top level driver > -- > > Key: SPARK-21894 > URL: https://issues.apache.org/jira/browse/SPARK-21894 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Charles Allen > > We have an environment with Netty 4.1 ( > https://issues.apache.org/jira/browse/SPARK-19552 for some context) and the > following error occurs. The reason THIS issue is being filed is because this > error leaves the Spark workload in a bad state where it does not make any > progress, and does not shut down. > The expected behavior is that the spark job would throw an exception that can > be caught by the driving application. > {code} > 017-09-01T16:13:32,175 ERROR [shuffle-server-3-2] > org.apache.spark.network.server.TransportRequestHandler - Error sending > result StreamResponse{streamId=/jars/lz4-1.3.0.jar, byteCount=236880, > body=FileSegmentManagedBuffer{file=/Users/charlesallen/.m2/repository/net/jpountz/lz4/lz4/1.3.0/lz4-1.3.0.jar, > offset=0, length=236880}} to /192.168.59.3:56703; closing connection > java.lang.AbstractMethodError > at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:810) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:801) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:296) > ~[netty-all-4.1.11.Final.jar:4.1.11.Final] > at > org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194) > [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] > at > org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150) > [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111) > [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119) > [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9] > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) >
[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141650#comment-16141650 ] Kazuaki Ishizaki commented on SPARK-18016: -- The issue {{Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0x}} will be addressed by [this PR|https://github.com/apache/spark/pull/16648]. > Code Generation: Constant Pool Past Limit for Wide/Nested Dataset > - > > Key: SPARK-18016 > URL: https://issues.apache.org/jira/browse/SPARK-18016 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Aleksander Eskilson >Assignee: Aleksander Eskilson > Fix For: 2.3.0 > > > When attempting to encode collections of large Java objects to Datasets > having very wide or deeply nested schemas, code generation can fail, yielding: > {code} > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for > class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection > has grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499) > at > org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439) > at > org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358) > at > org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894) > at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420) > at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$AbstractPackageMemberCl
[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again
[ https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141361#comment-16141361 ] Kazuaki Ishizaki commented on SPARK-21828: -- Thank you for your report. Some fixes solved this problem in Spark 2.2, but they were not backported to Spark 2.1. If you need backport to 2.1, please let us know here. I will start identifying root cause of this issue and backporting of a PR. > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB...again > - > > Key: SPARK-21828 > URL: https://issues.apache.org/jira/browse/SPARK-21828 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Otis Smart >Priority: Critical > > Hello! > 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., > dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of > CrossValidator() that includes Pipeline() that includes StringIndexer(), > VectorAssembler() and DecisionTreeClassifier()). > 2. Was the aforementioned patch (aka > fix(https://github.com/apache/spark/pull/15480) not included in the latest > release; what are the reason and (source) of and solution to this persistent > issue please? > py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 > in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage > 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > /* 001 */ public SpecificOrdering generate(Object[] references) > { /* 002 */ return new SpecificOrdering(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificOrdering extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ > /* 009 */ > /* 010 */ public SpecificOrdering(Object[] references) > { /* 011 */ this.references = references; /* 012 */ /* 013 */ } > /* 014 */ > /* 015 */ > /* 016 */ > /* 017 */ public int compare(InternalRow a, InternalRow b) { > /* 018 */ InternalRow i = null; // Holds current row being evaluated. > /* 019 */ > /* 020 */ i = a; > /* 021 */ boolean isNullA; > /* 022 */ double primitiveA; > /* 023 */ > { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = > false; /* 027 */ primitiveA = value; /* 028 */ } > /* 029 */ i = b; > /* 030 */ boolean isNullB; > /* 031 */ double primitiveB; > /* 032 */ > { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = > false; /* 036 */ primitiveB = value; /* 037 */ } > /* 038 */ if (isNullA && isNullB) > { /* 039 */ // Nothing /* 040 */ } > else if (isNullA) > { /* 041 */ return -1; /* 042 */ } > else if (isNullB) > { /* 043 */ return 1; /* 044 */ } > else { > /* 045 */ int comp = > org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB); > /* 046 */ if (comp != 0) > { /* 047 */ return comp; /* 048 */ } > /* 049 */ } > /* 050 */ > /* 051 */ > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again
[ https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140282#comment-16140282 ] Kazuaki Ishizaki commented on SPARK-21828: -- Thank you for reporting a problem. First, IIUC, this PR (https://github.com/apache/spark/pull/15480) has been included in the latest release. Thus, the test case "SPARK-16845..." in {{OrderingSuite.scala}} does not fail. Could you please put a program that can reproduce this issue? Then, I will investigate this. > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB...again > - > > Key: SPARK-21828 > URL: https://issues.apache.org/jira/browse/SPARK-21828 > Project: Spark > Issue Type: Bug > Components: ML, SQL >Affects Versions: 2.2.0 >Reporter: Otis Smart >Priority: Critical > > Hello! > 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., > dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of > CrossValidator() that includes Pipeline() that includes StringIndexer(), > VectorAssembler() and DecisionTreeClassifier()). > 2. Was the aforementioned patch (aka > fix(https://github.com/apache/spark/pull/15480) not included in the latest > release; what are the reason and (source) of and solution to this persistent > issue please? > py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 > in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage > 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" > grows beyond 64 KB > /* 001 */ public SpecificOrdering generate(Object[] references) > { /* 002 */ return new SpecificOrdering(references); /* 003 */ } > /* 004 */ > /* 005 */ class SpecificOrdering extends > org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ > /* 009 */ > /* 010 */ public SpecificOrdering(Object[] references) > { /* 011 */ this.references = references; /* 012 */ /* 013 */ } > /* 014 */ > /* 015 */ > /* 016 */ > /* 017 */ public int compare(InternalRow a, InternalRow b) { > /* 018 */ InternalRow i = null; // Holds current row being evaluated. > /* 019 */ > /* 020 */ i = a; > /* 021 */ boolean isNullA; > /* 022 */ double primitiveA; > /* 023 */ > { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = > false; /* 027 */ primitiveA = value; /* 028 */ } > /* 029 */ i = b; > /* 030 */ boolean isNullB; > /* 031 */ double primitiveB; > /* 032 */ > { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = > false; /* 036 */ primitiveB = value; /* 037 */ } > /* 038 */ if (isNullA && isNullB) > { /* 039 */ // Nothing /* 040 */ } > else if (isNullA) > { /* 041 */ return -1; /* 042 */ } > else if (isNullB) > { /* 043 */ return 1; /* 044 */ } > else { > /* 045 */ int comp = > org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB); > /* 046 */ if (comp != 0) > { /* 047 */ return comp; /* 048 */ } > /* 049 */ } > /* 050 */ > /* 051 */ > ... -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21750) Use arrow 0.6.0
[ https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136920#comment-16136920 ] Kazuaki Ishizaki commented on SPARK-21750: -- Closed this since to upgrade Arrow requires to upgrade Jenkins environment for the Python side. For now, it is not necessary to upgrade Arrow at the Python side. Details in the discussion in the PR. > Use arrow 0.6.0 > --- > > Key: SPARK-21750 > URL: https://issues.apache.org/jira/browse/SPARK-21750 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been > released, use the latest one -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21750) Use arrow 0.6.0
[ https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki closed SPARK-21750. Resolution: Won't Fix > Use arrow 0.6.0 > --- > > Key: SPARK-21750 > URL: https://issues.apache.org/jira/browse/SPARK-21750 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been > released, use the latest one -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21794) exception about reading task serial data(broadcast) value when the storage memory is not enough to unroll
[ https://issues.apache.org/jira/browse/SPARK-21794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134440#comment-16134440 ] Kazuaki Ishizaki commented on SPARK-21794: -- Thank you for reporting this issue. Could you please attach a program that can reproduce this problem? > exception about reading task serial data(broadcast) value when the storage > memory is not enough to unroll > - > > Key: SPARK-21794 > URL: https://issues.apache.org/jira/browse/SPARK-21794 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.1.1 >Reporter: roncenzhao > Attachments: error stack.png > > > ``` > 17/08/09 19:27:43 ERROR Utils: Exception encountered > java.util.NoSuchElementException > at > org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58) > at > org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > 17/08/09 19:27:43 INFO UnifiedMemoryManager: Will not store broadcast_5 as > the required space (1048576 bytes) exceeds our memory limit (878230 bytes) > 17/08/09 19:27:43 WARN MemoryStore: Failed to reserve initial memory > threshold of 1024.0 KB for computing block broadcast_5 in memory. > 17/08/09 19:27:43 WARN MemoryStore: Not enough space to cache broadcast_5 in > memory! (computed 384.0 B so far) > 17/08/09 19:27:43 INFO MemoryStore: Memory use = 857.6 KB (blocks) + 0.0 B > (scratch space shared across 0 tasks(s)) = 857.6 KB. Storage limit = 857.6 KB. > 17/08/09 19:27:43 ERROR Utils: Exception encountered > java.util.NoSuchElementException > at > org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58) > at > org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697) > at > org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178) > at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276) > at > org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174) > at > org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65) > at > org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65) > at > org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89) > at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > a
[jira] [Commented] (SPARK-21776) How to use the memory-mapped file on Spark??
[ https://issues.apache.org/jira/browse/SPARK-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131681#comment-16131681 ] Kazuaki Ishizaki commented on SPARK-21776: -- Is this a question? It this is a kind of questions, it would be good to send a message to u...@spark.apache.org OR d...@spark.apache.org. > How to use the memory-mapped file on Spark?? > > > Key: SPARK-21776 > URL: https://issues.apache.org/jira/browse/SPARK-21776 > Project: Spark > Issue Type: Bug > Components: Block Manager, Documentation, Input/Output, Spark Core >Affects Versions: 2.1.1 > Environment: Spark 2.1.1 > Scala 2.11.8 >Reporter: zhaP524 > Attachments: screenshot-1.png, screenshot-2.png > > > In generation, we have to use the Spark full quantity loaded HBase > table based on one dimension table to generate business, because the base > table is total quantity loaded, the memory will pressure is very big, I want > to see if the Spark can use this way to deal with memory mapped file?Is there > such a mechanism?How do you use it? > And I found in the Spark a parameter: > spark.storage.memoryMapThreshold=2m, is not very clear what this parameter is > used for? >There is a putBytes and getBytes method in DiskStore.scala with Spark > source code, is this the memory-mapped file mentioned above?How to understand? >Let me know if you have any trouble.. > Wish to You!! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error
[ https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130231#comment-16130231 ] Kazuaki Ishizaki commented on SPARK-21720: -- I identified issues in {{predicates.scala}}. I am creating fixes. > Filter predicate with many conditions throw stackoverflow error > --- > > Key: SPARK-21720 > URL: https://issues.apache.org/jira/browse/SPARK-21720 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: srinivasan > > When trying to filter on dataset with many predicate conditions on both spark > sql and dataset filter transformation as described below, spark throws a > stackoverflow exception > Case 1: Filter Transformation on Data > Dataset filter = sourceDataset.filter(String.format("not(%s)", > buildQuery())); > filter.show(); > where buildQuery() returns > Field1 = "" and Field2 = "" and Field3 = "" and Field4 = "" and Field5 = > "" and BLANK_5 = "" and Field7 = "" and Field8 = "" and Field9 = "" and > Field10 = "" and Field11 = "" and Field12 = "" and Field13 = "" and > Field14 = "" and Field15 = "" and Field16 = "" and Field17 = "" and > Field18 = "" and Field19 = "" and Field20 = "" and Field21 = "" and > Field22 = "" and Field23 = "" and Field24 = "" and Field25 = "" and > Field26 = "" and Field27 = "" and Field28 = "" and Field29 = "" and > Field30 = "" and Field31 = "" and Field32 = "" and Field33 = "" and > Field34 = "" and Field35 = "" and Field36 = "" and Field37 = "" and > Field38 = "" and Field39 = "" and Field40 = "" and Field41 = "" and > Field42 = "" and Field43 = "" and Field44 = "" and Field45 = "" and > Field46 = "" and Field47 = "" and Field48 = "" and Field49 = "" and > Field50 = "" and Field51 = "" and Field52 = "" and Field53 = "" and > Field54 = "" and Field55 = "" and Field56 = "" and Field57 = "" and > Field58 = "" and Field59 = "" and Field60 = "" and Field61 = "" and > Field62 = "" and Field63 = "" and Field64 = "" and Field65 = "" and > Field66 = "" and Field67 = "" and Field68 = "" and Field69 = "" and > Field70 = "" and Field71 = "" and Field72 = "" and Field73 = "" and > Field74 = "" and Field75 = "" and Field76 = "" and Field77 = "" and > Field78 = "" and Field79 = "" and Field80 = "" and Field81 = "" and > Field82 = "" and Field83 = "" and Field84 = "" and Field85 = "" and > Field86 = "" and Field87 = "" and Field88 = "" and Field89 = "" and > Field90 = "" and Field91 = "" and Field92 = "" and Field93 = "" and > Field94 = "" and Field95 = "" and Field96 = "" and Field97 = "" and > Field98 = "" and Field99 = "" and Field100 = "" and Field101 = "" and > Field102 = "" and Field103 = "" and Field104 = "" and Field105 = "" and > Field106 = "" and Field107 = "" and Field108 = "" and Field109 = "" and > Field110 = "" and Field111 = "" and Field112 = "" and Field113 = "" and > Field114 = "" and Field115 = "" and Field116 = "" and Field117 = "" and > Field118 = "" and Field119 = "" and Field120 = "" and Field121 = "" and > Field122 = "" and Field123 = "" and Field124 = "" and Field125 = "" and > Field126 = "" and Field127 = "" and Field128 = "" and Field129 = "" and > Field130 = "" and Field131 = "" and Field132 = "" and Field133 = "" and > Field134 = "" and Field135 = "" and Field136 = "" and Field137 = "" and > Field138 = "" and Field139 = "" and Field140 = "" and Field141 = "" and > Field142 = "" and Field143 = "" and Field144 = "" and Field145 = "" and > Field146 = "" and Field147 = "" and Field148 = "" and Field149 = "" and > Field150 = "" and Field151 = "" and Field152 = "" and Field153 = "" and > Field154 = "" and Field155 = "" and Field156 = "" and Field157 = "" and > Field158 = "" and Field159 = "" and Field160 = "" and Field161 = "" and > Field162 = "" and Field163 = "" and Field164 = "" and Field165 = "" and > Field166 = "" and Field167 = "" and Field168 = "" and Field169 = "" and > Field170 = "" and Field171 = "" and Field172 = "" and Field173 = "" and > Field174 = "" and Field175 = "" and Field176 = "" and Field177 = "" and > Field178 = "" and Field179 = "" and Field180 = "" and Field181 = "" and > Field182 = "" and Field183 = "" and Field184 = "" and Field185 = "" and > Field186 = "" and Field187 = "" and Field188 = "" and Field189 = "" and > Field190 = "" and Field191 = "" and Field192 = "" and Field193 = "" and > Field194 = "" and Field195 = "" and Field196 = "" and Field197 = "" and > Field198 = "" and Field199 = "" and Field200 = "" and Field201 = "" and > Field202 = "" and Field203 = "" and Field204
[jira] [Created] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely
Kazuaki Ishizaki created SPARK-21751: Summary: CodeGeneraor.splitExpressions counts code size more precisely Key: SPARK-21751 URL: https://issues.apache.org/jira/browse/SPARK-21751 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Reporter: Kazuaki Ishizaki Priority: Minor Current {{CodeGeneraor.splitExpressions}} splits statements if their total length is more than 1200 characters. It may include comments or empty line. It would be good to exclude comment or empty line to reduce the number of generated methods in a class. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21750) Use arrow 0.6.0
[ https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129133#comment-16129133 ] Kazuaki Ishizaki commented on SPARK-21750: -- Waiting for it on mvnrepository > Use arrow 0.6.0 > --- > > Key: SPARK-21750 > URL: https://issues.apache.org/jira/browse/SPARK-21750 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been > released, use the latest one -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21750) Use arrow 0.6.0
Kazuaki Ishizaki created SPARK-21750: Summary: Use arrow 0.6.0 Key: SPARK-21750 URL: https://issues.apache.org/jira/browse/SPARK-21750 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.0 Reporter: Kazuaki Ishizaki Priority: Minor Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been released, use the latest one -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error
[ https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127477#comment-16127477 ] Kazuaki Ishizaki commented on SPARK-21720: -- In this case, to add JVM option {{-Xss512m}} eliminates this exception and this works well. When the number of fields is 1024, I got the following exception: {code} 08:41:40.022 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB ... {code} I am working for solving this 64KB problem. > Filter predicate with many conditions throw stackoverflow error > --- > > Key: SPARK-21720 > URL: https://issues.apache.org/jira/browse/SPARK-21720 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: srinivasan > > When trying to filter on dataset with many predicate conditions on both spark > sql and dataset filter transformation as described below, spark throws a > stackoverflow exception > Case 1: Filter Transformation on Data > Dataset filter = sourceDataset.filter(String.format("not(%s)", > buildQuery())); > filter.show(); > where buildQuery() returns > Field1 = "" and Field2 = "" and Field3 = "" and Field4 = "" and Field5 = > "" and BLANK_5 = "" and Field7 = "" and Field8 = "" and Field9 = "" and > Field10 = "" and Field11 = "" and Field12 = "" and Field13 = "" and > Field14 = "" and Field15 = "" and Field16 = "" and Field17 = "" and > Field18 = "" and Field19 = "" and Field20 = "" and Field21 = "" and > Field22 = "" and Field23 = "" and Field24 = "" and Field25 = "" and > Field26 = "" and Field27 = "" and Field28 = "" and Field29 = "" and > Field30 = "" and Field31 = "" and Field32 = "" and Field33 = "" and > Field34 = "" and Field35 = "" and Field36 = "" and Field37 = "" and > Field38 = "" and Field39 = "" and Field40 = "" and Field41 = "" and > Field42 = "" and Field43 = "" and Field44 = "" and Field45 = "" and > Field46 = "" and Field47 = "" and Field48 = "" and Field49 = "" and > Field50 = "" and Field51 = "" and Field52 = "" and Field53 = "" and > Field54 = "" and Field55 = "" and Field56 = "" and Field57 = "" and > Field58 = "" and Field59 = "" and Field60 = "" and Field61 = "" and > Field62 = "" and Field63 = "" and Field64 = "" and Field65 = "" and > Field66 = "" and Field67 = "" and Field68 = "" and Field69 = "" and > Field70 = "" and Field71 = "" and Field72 = "" and Field73 = "" and > Field74 = "" and Field75 = "" and Field76 = "" and Field77 = "" and > Field78 = "" and Field79 = "" and Field80 = "" and Field81 = "" and > Field82 = "" and Field83 = "" and Field84 = "" and Field85 = "" and > Field86 = "" and Field87 = "" and Field88 = "" and Field89 = "" and > Field90 = "" and Field91 = "" and Field92 = "" and Field93 = "" and > Field94 = "" and Field95 = "" and Field96 = "" and Field97 = "" and > Field98 = "" and Field99 = "" and Field100 = "" and Field101 = "" and > Field102 = "" and Field103 = "" and Field104 = "" and Field105 = "" and > Field106 = "" and Field107 = "" and Field108 = "" and Field109 = "" and > Field110 = "" and Field111 = "" and Field112 = "" and Field113 = "" and > Field114 = "" and Field115 = "" and Field116 = "" and Field117 = "" and > Field118 = "" and Field119 = "" and Field120 = "" and Field121 = "" and > Field122 = "" and Field123 = "" and Field124 = "" and Field125 = "" and > Field126 = "" and Field127 = "" and Field128 = "" and Field129 = "" and > Field130 = "" and Field131 = "" and Field132 = "" and Field133 = "" and > Field134 = "" and Field135 = "" and Field136 = "" and Field137 = "" and > Field138 = "" and Field139 = "" and Field140 = "" and Field141 = "" and > Field142 = "" and Field143 = "" and Field144 = "" and Field145 = "" and > Field146 = "" and Field147 = "" and Field148 = "" and Field149 = "" and > Field150 = "" and Field151 = "" and Field152 = "" and Field153 = "" and > Field154 = "" and Field155 = "" and Field156 = "" and Field157 = "" and > Field158 = "" and Field159 = "" and Field160 = "" and Field161 = "" and > Field162 = "" and Field163 = "" and Field164 = "" and Field165 = "" and > Field166 = "" and Field167 = "" and Field168 = "" and Field169 = "" and > Field170 = "" and Field171 = "" and Field172 = "" and Field173 = "" and > Field174 = "" and Field175 = "" and Field176 = "" and
[jira] [Comment Edited] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error
[ https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127477#comment-16127477 ] Kazuaki Ishizaki edited comment on SPARK-21720 at 8/15/17 4:26 PM: --- In this case, to add JVM option {{-Xss512m}} eliminates this exception and this works well. However, when the number of fields is 1024, I got the following exception: {code} 08:41:40.022 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB ... {code} I am working for solving this 64KB problem. was (Author: kiszk): In this case, to add JVM option {{-Xss512m}} eliminates this exception and this works well. When the number of fields is 1024, I got the following exception: {code} 08:41:40.022 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB ... {code} I am working for solving this 64KB problem. > Filter predicate with many conditions throw stackoverflow error > --- > > Key: SPARK-21720 > URL: https://issues.apache.org/jira/browse/SPARK-21720 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: srinivasan > > When trying to filter on dataset with many predicate conditions on both spark > sql and dataset filter transformation as described below, spark throws a > stackoverflow exception > Case 1: Filter Transformation on Data > Dataset filter = sourceDataset.filter(String.format("not(%s)", > buildQuery())); > filter.show(); > where buildQuery() returns > Field1 = "" and Field2 = "" and Field3 = "" and Field4 = "" and Field5 = > "" and BLANK_5 = "" and Field7 = "" and Field8 = "" and Field9 = "" and > Field10 = "" and Field11 = "" and Field12 = "" and Field13 = "" and > Field14 = "" and Field15 = "" and Field16 = "" and Field17 = "" and > Field18 = "" and Field19 = "" and Field20 = "" and Field21 = "" and > Field22 = "" and Field23 = "" and Field24 = "" and Field25 = "" and > Field26 = "" and Field27 = "" and Field28 = "" and Field29 = "" and > Field30 = "" and Field31 = "" and Field32 = "" and Field33 = "" and > Field34 = "" and Field35 = "" and Field36 = "" and Field37 = "" and > Field38 = "" and Field39 = "" and Field40 = "" and Field41 = "" and > Field42 = "" and Field43 = "" and Field44 = "" and Field45 = "" and > Field46 = "" and Field47 = "" and Field48 = "" and Field49 = "" and > Field50 = "" and Field51 = "" and Field52 = "" and Field53 = "" and > Field54 = "" and Field55 = "" and Field56 = "" and Field57 = "" and > Field58 = "" and Field59 = "" and Field60 = "" and Field61 = "" and > Field62 = "" and Field63 = "" and Field64 = "" and Field65 = "" and > Field66 = "" and Field67 = "" and Field68 = "" and Field69 = "" and > Field70 = "" and Field71 = "" and Field72 = "" and Field73 = "" and > Field74 = "" and Field75 = "" and Field76 = "" and Field77 = "" and > Field78 = "" and Field79 = "" and Field80 = "" and Field81 = "" and > Field82 = "" and Field83 = "" and Field84 = "" and Field85 = "" and > Field86 = "" and Field87 = "" and Field88 = "" and Field89 = "" and > Field90 = "" and Field91 = "" and Field92 = "" and Field93 = "" and > Field94 = "" and Field95 = "" and Field96 = "" and Field97 = "" and > Field98 = "" and Field99 = "" and Field100 = "" and Field101 = "" and > Field102 = "" and Field103 = "" and Field104 = "" and Field105 = "" and > Field106 = "" and Field107 = "" and Field108 = "" and Field109 = "" and > Field110 = "" and Field111 = "" and Field112 = "" and Field113 = "" and > Field114 = "" and Field115 = "" and Field116 = "" and Field117 = "" and > Field118 = "" and Field119 = "" and Field120 = "" and Field121 = "" and > Field122 = "" and Field123 = "" and Field124 = "" and Field125 = "" and > Field126 = "" and Field127 = "" and Field128 = "" and Field129 = "" and > Field130 = "" and Field131 = "" and Field132 = "" and Field133 = "" and > Field134 = "" and Field135 = "" and Field136 = "" and Field137 = "" and > Field138 = "" and Field139 = "" and Field140 = "" and Field141 = "" and
[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error
[ https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125253#comment-16125253 ] Kazuaki Ishizaki commented on SPARK-21720: -- I confirmed that this occurs in the master branch. I will work for this. > Filter predicate with many conditions throw stackoverflow error > --- > > Key: SPARK-21720 > URL: https://issues.apache.org/jira/browse/SPARK-21720 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: srinivasan > > When trying to filter on dataset with many predicate conditions on both spark > sql and dataset filter transformation as described below, spark throws a > stackoverflow exception > Case 1: Filter Transformation on Data > Dataset filter = sourceDataset.filter(String.format("not(%s)", > buildQuery())); > filter.show(); > where buildQuery() returns > Field1 = "" and Field2 = "" and Field3 = "" and Field4 = "" and Field5 = > "" and BLANK_5 = "" and Field7 = "" and Field8 = "" and Field9 = "" and > Field10 = "" and Field11 = "" and Field12 = "" and Field13 = "" and > Field14 = "" and Field15 = "" and Field16 = "" and Field17 = "" and > Field18 = "" and Field19 = "" and Field20 = "" and Field21 = "" and > Field22 = "" and Field23 = "" and Field24 = "" and Field25 = "" and > Field26 = "" and Field27 = "" and Field28 = "" and Field29 = "" and > Field30 = "" and Field31 = "" and Field32 = "" and Field33 = "" and > Field34 = "" and Field35 = "" and Field36 = "" and Field37 = "" and > Field38 = "" and Field39 = "" and Field40 = "" and Field41 = "" and > Field42 = "" and Field43 = "" and Field44 = "" and Field45 = "" and > Field46 = "" and Field47 = "" and Field48 = "" and Field49 = "" and > Field50 = "" and Field51 = "" and Field52 = "" and Field53 = "" and > Field54 = "" and Field55 = "" and Field56 = "" and Field57 = "" and > Field58 = "" and Field59 = "" and Field60 = "" and Field61 = "" and > Field62 = "" and Field63 = "" and Field64 = "" and Field65 = "" and > Field66 = "" and Field67 = "" and Field68 = "" and Field69 = "" and > Field70 = "" and Field71 = "" and Field72 = "" and Field73 = "" and > Field74 = "" and Field75 = "" and Field76 = "" and Field77 = "" and > Field78 = "" and Field79 = "" and Field80 = "" and Field81 = "" and > Field82 = "" and Field83 = "" and Field84 = "" and Field85 = "" and > Field86 = "" and Field87 = "" and Field88 = "" and Field89 = "" and > Field90 = "" and Field91 = "" and Field92 = "" and Field93 = "" and > Field94 = "" and Field95 = "" and Field96 = "" and Field97 = "" and > Field98 = "" and Field99 = "" and Field100 = "" and Field101 = "" and > Field102 = "" and Field103 = "" and Field104 = "" and Field105 = "" and > Field106 = "" and Field107 = "" and Field108 = "" and Field109 = "" and > Field110 = "" and Field111 = "" and Field112 = "" and Field113 = "" and > Field114 = "" and Field115 = "" and Field116 = "" and Field117 = "" and > Field118 = "" and Field119 = "" and Field120 = "" and Field121 = "" and > Field122 = "" and Field123 = "" and Field124 = "" and Field125 = "" and > Field126 = "" and Field127 = "" and Field128 = "" and Field129 = "" and > Field130 = "" and Field131 = "" and Field132 = "" and Field133 = "" and > Field134 = "" and Field135 = "" and Field136 = "" and Field137 = "" and > Field138 = "" and Field139 = "" and Field140 = "" and Field141 = "" and > Field142 = "" and Field143 = "" and Field144 = "" and Field145 = "" and > Field146 = "" and Field147 = "" and Field148 = "" and Field149 = "" and > Field150 = "" and Field151 = "" and Field152 = "" and Field153 = "" and > Field154 = "" and Field155 = "" and Field156 = "" and Field157 = "" and > Field158 = "" and Field159 = "" and Field160 = "" and Field161 = "" and > Field162 = "" and Field163 = "" and Field164 = "" and Field165 = "" and > Field166 = "" and Field167 = "" and Field168 = "" and Field169 = "" and > Field170 = "" and Field171 = "" and Field172 = "" and Field173 = "" and > Field174 = "" and Field175 = "" and Field176 = "" and Field177 = "" and > Field178 = "" and Field179 = "" and Field180 = "" and Field181 = "" and > Field182 = "" and Field183 = "" and Field184 = "" and Field185 = "" and > Field186 = "" and Field187 = "" and Field188 = "" and Field189 = "" and > Field190 = "" and Field191 = "" and Field192 = "" and Field193 = "" and > Field194 = "" and Field195 = "" and Field196 = "" and Field197 = "" and > Field198 = "" and Field199 = "" and Field200 = "" and Field201 = "" and > Field202 = "" and Field203 = "" and F
[jira] [Comment Edited] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit
[ https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124982#comment-16124982 ] Kazuaki Ishizaki edited comment on SPARK-19372 at 8/13/17 5:05 PM: --- [~srinivasanm] I can reproduce this issue by using the master branch. I think that this is another problem. Could you please create another JIRA entry to track this issue? I will work for this. was (Author: kiszk): [~srinivasanm] I can reproduce this issue by using the master branch. I think that this is another problem. Could you please create another JIRA entry to track this issue? > Code generation for Filter predicate including many OR conditions exceeds JVM > method size limit > > > Key: SPARK-19372 > URL: https://issues.apache.org/jira/browse/SPARK-19372 > Project: Spark > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Assignee: Kazuaki Ishizaki > Fix For: 2.2.0, 2.3.0 > > Attachments: wide400cols.csv > > > For the attached csv file, the code below causes the exception > "org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" > grows beyond 64 KB > Code: > {code:borderStyle=solid} > val conf = new SparkConf().setMaster("local[1]") > val sqlContext = > SparkSession.builder().config(conf).getOrCreate().sqlContext > val dataframe = > sqlContext > .read > .format("com.databricks.spark.csv") > .load("wide400cols.csv") > val filter = (0 to 399) > .foldLeft(lit(false))((e, index) => > e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}")) > val filtered = dataframe.filter(filter) > filtered.show(100) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit
[ https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124982#comment-16124982 ] Kazuaki Ishizaki commented on SPARK-19372: -- [~srinivasanm] I can reproduce this issue by using the master branch. I think that this is another problem. Could you please create another JIRA entry to track this issue? > Code generation for Filter predicate including many OR conditions exceeds JVM > method size limit > > > Key: SPARK-19372 > URL: https://issues.apache.org/jira/browse/SPARK-19372 > Project: Spark > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Assignee: Kazuaki Ishizaki > Fix For: 2.2.0, 2.3.0 > > Attachments: wide400cols.csv > > > For the attached csv file, the code below causes the exception > "org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" > grows beyond 64 KB > Code: > {code:borderStyle=solid} > val conf = new SparkConf().setMaster("local[1]") > val sqlContext = > SparkSession.builder().config(conf).getOrCreate().sqlContext > val dataframe = > sqlContext > .read > .format("com.databricks.spark.csv") > .load("wide400cols.csv") > val filter = (0 to 399) > .foldLeft(lit(false))((e, index) => > e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}")) > val filtered = dataframe.filter(filter) > filtered.show(100) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit
[ https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124856#comment-16124856 ] Kazuaki Ishizaki commented on SPARK-19372: -- Thank you for letting us know the problem. I investigate this. > Code generation for Filter predicate including many OR conditions exceeds JVM > method size limit > > > Key: SPARK-19372 > URL: https://issues.apache.org/jira/browse/SPARK-19372 > Project: Spark > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Jay Pranavamurthi >Assignee: Kazuaki Ishizaki > Fix For: 2.2.0, 2.3.0 > > Attachments: wide400cols.csv > > > For the attached csv file, the code below causes the exception > "org.codehaus.janino.JaninoRuntimeException: Code of method > "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" > grows beyond 64 KB > Code: > {code:borderStyle=solid} > val conf = new SparkConf().setMaster("local[1]") > val sqlContext = > SparkSession.builder().config(conf).getOrCreate().sqlContext > val dataframe = > sqlContext > .read > .format("com.databricks.spark.csv") > .load("wide400cols.csv") > val filter = (0 to 399) > .foldLeft(lit(false))((e, index) => > e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}")) > val filtered = dataframe.filter(filter) > filtered.show(100) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21276) Update lz4-java to remove custom LZ4BlockInputStream
[ https://issues.apache.org/jira/browse/SPARK-21276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118402#comment-16118402 ] Kazuaki Ishizaki commented on SPARK-21276: -- Is it better to update affected version? > Update lz4-java to remove custom LZ4BlockInputStream > - > > Key: SPARK-21276 > URL: https://issues.apache.org/jira/browse/SPARK-21276 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Takeshi Yamamuro >Priority: Trivial > > We currently use custom LZ4BlockInputStream to read concatenated byte stream > in shuffle > (https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/io/LZ4BlockInputStream.java#L38). > In the recent pr (https://github.com/lz4/lz4-java/pull/105), this > functionality is implemented even in lz4-java upstream. So, we might update > the lz4-java package that will be released in near future. > Issue about the next lz4-java release > https://github.com/lz4/lz4-java/issues/98 > Diff between the latest release and the master in lz4-java > https://github.com/lz4/lz4-java/compare/62f7547abb0819d1ca1e669645ee1a9d26cd60b0...6480bd9e06f92471bf400c16d4d5f3fd2afa3b3d > * fixed NPE in XXHashFactory similarly > * Don't place resources in default package to support shading > * Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed > * Try to load lz4-java from java.library.path, then fallback to bundled > * Add ppc64le binary > * Add s390x JNI binding > * Add basic LZ4 Frame v1.5.0 support > * enable aarch64 support for lz4-java > * Allow unsafeInstance() for ppc64le archiecture > * Add unsafeInstance support for AArch64 > * Support 64-bit JNI build on Solaris > * Avoid over-allocating a buffer > * Allow EndMark to be incompressible for LZ4FrameInputStream. > * Concat byte stream -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110576#comment-16110576 ] Kazuaki Ishizaki commented on SPARK-21390: -- Thank you very much for pointing out the good JIRA entry. I will check it. > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0, 2.2.0 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21591) Implement treeAggregate on Dataset API
[ https://issues.apache.org/jira/browse/SPARK-21591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108644#comment-16108644 ] Kazuaki Ishizaki commented on SPARK-21591: -- I like this idea > Implement treeAggregate on Dataset API > -- > > Key: SPARK-21591 > URL: https://issues.apache.org/jira/browse/SPARK-21591 > Project: Spark > Issue Type: Brainstorming > Components: SQL >Affects Versions: 2.2.0 >Reporter: Yanbo Liang > > The Tungsten execution engine substantially improved the efficiency of memory > and CPU for Spark application. However, in MLlib we still not migrate the > internal computing workload from {{RDD}} to {{DataFrame}}. > One of the block issue is there is no {{treeAggregate}} on {{DataFrame}}. > It's very important for MLlib algorithms, since they do aggregate on > {{Vector}} which may has millions of elements. As we all know, {{RDD}} based > {{treeAggregate}} reduces the aggregation time by an order of magnitude for > lots of MLlib > algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html). > I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} > API and do the performance benchmark related issues. And I think other > scenarios except for MLlib will also benefit from this improvement if we get > it done. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104284#comment-16104284 ] Kazuaki Ishizaki commented on SPARK-18016: -- [~jamcon] Thank you reporting the problem. We fixed a problem for the large number (e.g. 4000) of columns. However, we know that we have not solved a problem for the very large number (e.g. 12000) of columns. I have just pinged the author that created the fix to solve these two problems. > Code Generation: Constant Pool Past Limit for Wide/Nested Dataset > - > > Key: SPARK-18016 > URL: https://issues.apache.org/jira/browse/SPARK-18016 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Aleksander Eskilson >Assignee: Aleksander Eskilson > Fix For: 2.3.0 > > > When attempting to encode collections of large Java objects to Datasets > having very wide or deeply nested schemas, code generation can fail, yielding: > {code} > Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for > class > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection > has grown past JVM limit of 0x > at > org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499) > at > org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439) > at > org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358) > at > org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547) > at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774) > at > org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762) > at > org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180) > at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151) > at > org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139) > at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112) > at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377) > at > org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370) > at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370) > at > org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894) > at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377) > at > org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128) > at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369) > at > org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564) > at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420) > at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374) > at > org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369) > at > org.codehaus.janino.Java$AbstractPackageMembe
[jira] [Commented] (SPARK-21496) Support codegen for TakeOrderedAndProjectExec
[ https://issues.apache.org/jira/browse/SPARK-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099840#comment-16099840 ] Kazuaki Ishizaki commented on SPARK-21496: -- Is there any good benchmark program for this? > Support codegen for TakeOrderedAndProjectExec > - > > Key: SPARK-21496 > URL: https://issues.apache.org/jira/browse/SPARK-21496 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Jiang Xingbo >Priority: Minor > > The operator `SortExec` supports codegen, but `TakeOrderedAndProjectExec` > doesn't. Perhaps we should also add codegen support for > `TakeOrderedAndProjectExec`, but we should also do benchmark for it carefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21517) Fetch local data via block manager cause oom
[ https://issues.apache.org/jira/browse/SPARK-21517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099654#comment-16099654 ] Kazuaki Ishizaki commented on SPARK-21517: -- Does it occur in Spark 2.2? > Fetch local data via block manager cause oom > > > Key: SPARK-21517 > URL: https://issues.apache.org/jira/browse/SPARK-21517 > Project: Spark > Issue Type: Improvement > Components: Block Manager, Spark Core >Affects Versions: 1.6.1, 2.1.0 >Reporter: zhoukang > > In our production cluster,oom happens when NettyBlockRpcServer receive > OpenBlocks message.The reason we observed is below: > When BlockManagerManagedBuffer call ChunkedByteBuffer#toNetty, it will use > Unpooled.wrappedBuffer(ByteBuffer... buffers) which use default > maxNumComponents=16 in low-level CompositeByteBuf.When our component's number > is bigger than 16, it will execute during buffer copy. > {code:java} > private void consolidateIfNeeded() { > int numComponents = this.components.size(); > if(numComponents > this.maxNumComponents) { > int capacity = > ((CompositeByteBuf.Component)this.components.get(numComponents - > 1)).endOffset; > ByteBuf consolidated = this.allocBuffer(capacity); > for(int c = 0; c < numComponents; ++c) { > CompositeByteBuf.Component c1 = > (CompositeByteBuf.Component)this.components.get(c); > ByteBuf b = c1.buf; > consolidated.writeBytes(b); > c1.freeIfNecessary(); > } > CompositeByteBuf.Component var7 = new > CompositeByteBuf.Component(consolidated); > var7.endOffset = var7.length; > this.components.clear(); > this.components.add(var7); > } > } > {code} > in CompositeByteBuf which will consume some memory during buffer copy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based
[ https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099351#comment-16099351 ] Kazuaki Ishizaki commented on SPARK-21501: -- I see. I misunderstood the description. You expect that memory cache would be enabled even when # of entries is larger than {{spark.shuffle.service.index.cache.entries}} if the total cache size is not large. > Spark shuffle index cache size should be memory based > - > > Key: SPARK-21501 > URL: https://issues.apache.org/jira/browse/SPARK-21501 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.1.0 >Reporter: Thomas Graves > > Right now the spark shuffle service has a cache for index files. It is based > on a # of files cached (spark.shuffle.service.index.cache.entries). This can > cause issues if people have a lot of reducers because the size of each entry > can fluctuate based on the # of reducers. > We saw an issues with a job that had 17 reducers and it caused NM with > spark shuffle service to use 700-800MB or memory in NM by itself. > We should change this cache to be memory based and only allow a certain > memory size used. When I say memory based I mean the cache should have a > limit of say 100MB. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
[ https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki closed SPARK-21387. Resolution: Cannot Reproduce > org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM > - > > Key: SPARK-21387 > URL: https://issues.apache.org/jira/browse/SPARK-21387 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
[ https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki closed SPARK-21387. Resolution: Fixed > org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM > - > > Key: SPARK-21387 > URL: https://issues.apache.org/jira/browse/SPARK-21387 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
[ https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki reopened SPARK-21387: -- > org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM > - > > Key: SPARK-21387 > URL: https://issues.apache.org/jira/browse/SPARK-21387 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
[ https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098545#comment-16098545 ] Kazuaki Ishizaki commented on SPARK-21387: -- While I got OOM in my unit test, I have to reinvestigate whether the unit test follows actual restrictions. > org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM > - > > Key: SPARK-21387 > URL: https://issues.apache.org/jira/browse/SPARK-21387 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based
[ https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098531#comment-16098531 ] Kazuaki Ishizaki commented on SPARK-21501: -- I guess that to use Spark 2.1 or later version alleviates this issue by SPARK-15074 > Spark shuffle index cache size should be memory based > - > > Key: SPARK-21501 > URL: https://issues.apache.org/jira/browse/SPARK-21501 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.0.0 >Reporter: Thomas Graves > > Right now the spark shuffle service has a cache for index files. It is based > on a # of files cached (spark.shuffle.service.index.cache.entries). This can > cause issues if people have a lot of reducers because the size of each entry > can fluctuate based on the # of reducers. > We saw an issues with a job that had 17 reducers and it caused NM with > spark shuffle service to use 700-800MB or memory in NM by itself. > We should change this cache to be memory based and only allow a certain > memory size used. When I say memory based I mean the cache should have a > limit of say 100MB. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21516) overriding afterEach() in DatasetCacheSuite must call super.afterEach()
Kazuaki Ishizaki created SPARK-21516: Summary: overriding afterEach() in DatasetCacheSuite must call super.afterEach() Key: SPARK-21516 URL: https://issues.apache.org/jira/browse/SPARK-21516 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Kazuaki Ishizaki When we override {{afterEach()}} method in Testsuite, we have to call {{super.afterEach()}}. This is follow-up of SPARK-21512. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent
[ https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097565#comment-16097565 ] Kazuaki Ishizaki edited comment on SPARK-21512 at 7/24/17 4:53 AM: --- When {{DatasetCacheSuite}} is executed, the following warning messages appear. Unpersistent dataset is made persistent in the second test case {{"persist and then rebind right encoder when join 2 datasets"}} after the first test case {{"get storage level"}} made it persistent. Thus, we run these test cases, the second case does not perform to make dataset persistent. This is because in When we run only the second case, it performs to make dataset persistent. It is not good to change behavior of the second test suite. The first test case should correctly make dataset unpersistent. {code} 01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. 01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. {code} was (Author: kiszk): When {DatasetCacheSuite} is executed, the following warning messages appear. Unpersistent dataset is made persistent in the second test case {{"persist and then rebind right encoder when join 2 datasets"}} after the first test case {{"get storage level"}} made it persistent. Thus, we run these test cases, the second case does not perform to make dataset persistent. This is because in When we run only the second case, it performs to make dataset persistent. It is not good to change behavior of the second test suite. The first test case should correctly make dataset unpersistent. {code} 01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. 01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. {code} > DatasetCacheSuite needs to execute unpersistent after executing peristent > - > > Key: SPARK-21512 > URL: https://issues.apache.org/jira/browse/SPARK-21512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki >Assignee: Kazuaki Ishizaki > Fix For: 2.3.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent
[ https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097565#comment-16097565 ] Kazuaki Ishizaki commented on SPARK-21512: -- When {DatasetCacheSuite} is executed, the following warning messages appear. Unpersistent dataset is made persistent in the second test case {{"persist and then rebind right encoder when join 2 datasets"}} after the first test case {{"get storage level"}} made it persistent. Thus, we run these test cases, the second case does not perform to make dataset persistent. This is because in When we run only the second case, it performs to make dataset persistent. It is not good to change behavior of the second test suite. The first test case should correctly make dataset unpersistent. {code} 01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. 01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache already cached data. {code} > DatasetCacheSuite needs to execute unpersistent after executing peristent > - > > Key: SPARK-21512 > URL: https://issues.apache.org/jira/browse/SPARK-21512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent
[ https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21512: - Summary: DatasetCacheSuite needs to execute unpersistent after executing peristent (was: DatasetCacheSuites need to execute unpersistent after executing peristent) > DatasetCacheSuite needs to execute unpersistent after executing peristent > - > > Key: SPARK-21512 > URL: https://issues.apache.org/jira/browse/SPARK-21512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21512) DatasetCacheSuites need to execute unpersistent after executing peristent
[ https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21512: - Summary: DatasetCacheSuites need to execute unpersistent after executing peristent (was: DatasetCacheSuite need to execute unpersistent after executing peristent) > DatasetCacheSuites need to execute unpersistent after executing peristent > - > > Key: SPARK-21512 > URL: https://issues.apache.org/jira/browse/SPARK-21512 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21512) DatasetCacheSuite need to execute unpersistent after executing peristent
Kazuaki Ishizaki created SPARK-21512: Summary: DatasetCacheSuite need to execute unpersistent after executing peristent Key: SPARK-21512 URL: https://issues.apache.org/jira/browse/SPARK-21512 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.0 Reporter: Kazuaki Ishizaki -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20822) Generate code to get value from ColumnVector in ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-20822: - Summary: Generate code to get value from ColumnVector in ColumnarBatch (was: Generate code to build table cache using ColumnarBatch and to get value from ColumnVector) > Generate code to get value from ColumnVector in ColumnarBatch > - > > Key: SPARK-20822 > URL: https://issues.apache.org/jira/browse/SPARK-20822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20822) Generate code to get value from CachedBatchColumnVector in ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-20822: - Summary: Generate code to get value from CachedBatchColumnVector in ColumnarBatch (was: Generate code to get value from ColumnVector in ColumnarBatch) > Generate code to get value from CachedBatchColumnVector in ColumnarBatch > > > Key: SPARK-20822 > URL: https://issues.apache.org/jira/browse/SPARK-20822 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.0 >Reporter: Kazuaki Ishizaki > -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21443) Very long planning duration for queries with lots of operations
[ https://issues.apache.org/jira/browse/SPARK-21443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090242#comment-16090242 ] Kazuaki Ishizaki commented on SPARK-21443: -- These two optimizations {{InferFiltersFromConstraints}} and {{PruneFiltersare}} known as time-consuming optimizations. Since It is not easy to fix to fix the root cause, Spark community introduced an option {{spark.sql.constraintPropagation.enabled}} to disable these optimization by [this PR|https://github.com/apache/spark/pull/17186]. Is it possible to alleviate the problem by using this option? > Very long planning duration for queries with lots of operations > --- > > Key: SPARK-21443 > URL: https://issues.apache.org/jira/browse/SPARK-21443 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.2.0 >Reporter: Eyal Zituny > > Creating a streaming query with large amount of operations and fields (100+) > results in a very long query planning phase. in the example bellow, the plan > phase has taken 35 seconds while the actual batch execution took only 1.3 > second. > after some investigation, i have found out that the root causes of this are 2 > optimizer rules which seems to take most of the planning time: > InferFiltersFromConstraints and PruneFilters > I would suggest the following: > # fix the inefficient optimizer rules > # add warn level logging if a rule has taken more then xx ms > # allow custom removing of optimizer rules (opposite to > spark.experimental.extraOptimizations) > # reuse query plans (optional) where possible > reproducing this issue can be done with the bellow script which simulates the > scenario: > {code:java} > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.execution.streaming.MemoryStream > import > org.apache.spark.sql.streaming.StreamingQueryListener.{QueryProgressEvent, > QueryStartedEvent, QueryTerminatedEvent} > import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQueryListener} > case class Product(pid: Long, name: String, price: Long, ts: Long = > System.currentTimeMillis()) > case class Events (eventId: Long, eventName: String, productId: Long) { > def this(id: Long) = this(id, s"event$id", id%100) > } > object SparkTestFlow { > def main(args: Array[String]): Unit = { > val spark = SparkSession > .builder > .appName("TestFlow") > .master("local[8]") > .getOrCreate() > spark.sqlContext.streams.addListener(new StreamingQueryListener > { > override def onQueryTerminated(event: > QueryTerminatedEvent): Unit = {} > override def onQueryProgress(event: > QueryProgressEvent): Unit = { > if (event.progress.numInputRows>0) { > println(event.progress.toString()) > } > } > override def onQueryStarted(event: QueryStartedEvent): > Unit = {} > }) > > import spark.implicits._ > implicit val sclContext = spark.sqlContext > import org.apache.spark.sql.functions.expr > val seq = (1L to 100L).map(i => Product(i, s"name$i", 10L*i)) > val lookupTable = spark.createDataFrame(seq) > val inputData = MemoryStream[Events] > inputData.addData((1L to 100L).map(i => new Events(i))) > val events = inputData.toDF() > .withColumn("w1", expr("0")) > .withColumn("x1", expr("0")) > .withColumn("y1", expr("0")) > .withColumn("z1", expr("0")) > val numberOfSelects = 40 // set to 100+ and the planning takes > forever > val dfWithSelectsExpr = (2 to > numberOfSelects).foldLeft(events)((df,i) =>{ > val arr = df.columns.++(Array(s"w${i-1} + rand() as > w$i", s"x${i-1} + rand() as x$i", s"y${i-1} + 2 as y$i", s"z${i-1} +1 as > z$i")) > df.selectExpr(arr:_*) > }) > val withJoinAndFilter = dfWithSelectsExpr > .join(lookupTable, expr("productId = pid")) > .filter("productId < 50") > val query = withJoinAndFilter.writeStream > .outputMode("append") > .format("console") > .trigger(ProcessingTime(2000)) > .start() > query.processAllAvailable() > spark.stop() > } > } > {code} > the query progress output will show: > {code:java} > "durationMs" : { > "addBatch" : 1310, > "getBatch" : 6, > "getO
[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1
[ https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089915#comment-16089915 ] Kazuaki Ishizaki commented on SPARK-21415: -- I see. When another JIRA will happen for these triage scapegoat warnings, we could make them umbrella. > Triage scapegoat warnings, part 1 > - > > Key: SPARK-21415 > URL: https://issues.apache.org/jira/browse/SPARK-21415 > Project: Spark > Issue Type: Improvement > Components: GraphX, ML, Spark Core, SQL, Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > Following the results of the scapegoat plugin at > https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040 > and some initial triage, I'd like to address all of the valid instances of > some classes of warning: > - BigDecimal double constructor > - Catching NPE > - Finalizer without super > - List.size is O(n) > - Prefer Seq.empty > - Prefer Set.empty > - reverse.map instead of reverseMap > - Type shadowing > - Unnecessary if condition. > - Use .log1p > - Var could be val -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089012#comment-16089012 ] Kazuaki Ishizaki commented on SPARK-21390: -- cc: [~ueshin] Is there any thought on this? > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0, 2.2.0 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088977#comment-16088977 ] Kazuaki Ishizaki commented on SPARK-21390: -- When I ran the following test suite in ReplSuite.scala, I got the assertion error at the last assertion. {code:java} test("SPARK-21390: incorrect filter with case class") { val output = runInterpreter("local", """ |case class SomeClass(f1: String, f2: String) |val ds = Seq(SomeClass("a", "b")).toDS |val filterCond = Seq(SomeClass("a", "b")) |ds.filter(x => filterCond.contains(SomeClass(x.f1, x.f2))).show """.stripMargin) print(s"$output\n") assertDoesNotContain("error:", output) assertDoesNotContain("Exception", output) assertContains("| a| b|", output) } {code} > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0, 2.2.0 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659 ] Kazuaki Ishizaki edited comment on SPARK-21418 at 7/16/17 8:25 AM: --- I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls {{toString}} method. Do you specify some option to run this program for JVM? was (Author: kiszk): I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls `toString` method. Do you specify some option to run this program for JVM? > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStre
[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088660#comment-16088660 ] Kazuaki Ishizaki edited comment on SPARK-21393 at 7/15/17 4:53 PM: --- I confirmed that this python program works well without an exception after applying a PR for SPARK-21413. was (Author: kiszk): I confirmed that this python program works well after applying a PR for SPARK-21413. > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.I
[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088660#comment-16088660 ] Kazuaki Ishizaki commented on SPARK-21393: -- I confirmed that this python program works well after applying a PR for SPARK-21413. > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.IntWrapper wrapper25; > /* 062 */ private scala.collection.immutable.Set set28; > /* 063 */ private UTF8String.IntWrapper wrapper26; > /* 064 */ private UTF8String.IntWrapper
[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659 ] Kazuaki Ishizaki edited comment on SPARK-21418 at 7/15/17 4:43 PM: --- I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls `toString` method. Do you specify some option to run this program for JVM? was (Author: kiszk): I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls `toString` method. Do you specify some option to run this program for JVM? > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.j
[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd
[ https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659 ] Kazuaki Ishizaki commented on SPARK-21418: -- I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls `toString` method. Do you specify some option to run this program for JVM? > NoSuchElementException: None.get on DataFrame.rdd > - > > Key: SPARK-21418 > URL: https://issues.apache.org/jira/browse/SPARK-21418 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.0 >Reporter: Daniel Darabos > > I don't have a minimal reproducible example yet, sorry. I have the following > lines in a unit test for our Spark application: > {code} > val df = mySparkSession.read.format("jdbc") > .options(Map("url" -> url, "dbtable" -> "test_table")) > .load() > df.show > println(df.rdd.collect) > {code} > The output shows the DataFrame contents from {{df.show}}. But the {{collect}} > fails: > {noformat} > org.apache.spark.SparkException: Job aborted due to stage failure: Task > serialization failed: java.util.NoSuchElementException: None.get > java.util.NoSuchElementException: None.get > at scala.None$.get(Option.scala:347) > at scala.None$.get(Option.scala:345) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54) > at > org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60) > at > org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451) > at > org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480) > at > org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477) > at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) > at > java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) > at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) > at > java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) > at java.io.ObjectOutputStream.writeObject
[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087391#comment-16087391 ] Kazuaki Ishizaki commented on SPARK-21393: -- Not yet, however I created a patch not to cause failure for a program in SPARK-21413. I will submit a pull request when I can create a test suite for this patch. Then, I expect that it will be merged into the master. > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.IntWrapper wrapper25; > /* 062 */ private sca
[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1
[ https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087097#comment-16087097 ] Kazuaki Ishizaki commented on SPARK-21415: -- Thank you. Is it better to create an umbrella JIRA entry for Triage scapegoat works? > Triage scapegoat warnings, part 1 > - > > Key: SPARK-21415 > URL: https://issues.apache.org/jira/browse/SPARK-21415 > Project: Spark > Issue Type: Improvement > Components: GraphX, ML, Spark Core, SQL, Structured Streaming >Affects Versions: 2.3.0 >Reporter: Sean Owen >Assignee: Sean Owen >Priority: Minor > > Following the results of the scapegoat plugin at > https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040 > and some initial triage, I'd like to address all of the valid instances of > some classes of warning: > - BigDecimal double constructor > - Catching NPE > - Finalizer without super > - List.size is O(n) > - Prefer Seq.empty > - Prefer Set.empty > - reverse.map instead of reverseMap > - Type shadowing > - Unnecessary if condition. > - Use .log1p > - Var could be val -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21413) Multiple projections with CASE WHEN fails to run generated codes
[ https://issues.apache.org/jira/browse/SPARK-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086927#comment-16086927 ] Kazuaki Ishizaki commented on SPARK-21413: -- Thank you for preparing a good repro. I can reproduce this problem. I think that this can cause the same problem as SPARK-21393. I am working for this. > Multiple projections with CASE WHEN fails to run generated codes > > > Key: SPARK-21413 > URL: https://issues.apache.org/jira/browse/SPARK-21413 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Hyukjin Kwon > > Scala codes to reproduce are as below: > {code} > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.types._ > import org.apache.spark.sql.Row > val schema = StructType(StructField("fieldA", IntegerType) :: Nil) > var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), > schema) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df.show() > {code} > Calling {{explain()}} on the dataframe in the former case shows a huge > case-when projection and {{show()}} fails with the exception as below: > {code} > ... > Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949) > at org.codehaus.janino.CodeContext.write(CodeContext.java:839) > at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081) > at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674) > at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911) > at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206) > at > org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776) > ... > {code} > Note that, I could not reproduce this with local relation (this one appears > by {{ConvertToLocalRelation}}). > {code} > import org.apache.spark.sql.functions._ > var df = Seq(1).toDF("fieldA") > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA")) > df.show() > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086254#comment-16086254 ] Kazuaki Ishizaki commented on SPARK-21393: -- This program can cause the same exception {code} from __future__ import absolute_import, division, print_function import findspark findspark.init() import pyspark from pyspark.sql.functions import * from pyspark import SparkContext from pyspark import SparkConf from pyspark.sql import SQLContext import pyspark.sql.functions as sf sc = SparkContext() sqlContext = SQLContext(sc) ### data df = sqlContext.read.load('./Data/claims.csv', format='com.databricks.spark.csv', header=True) df_new = df.withColumn('service_type_col',sf.when((sf.col('RevenueCategory') == "Emergency Room") | (sf.col('CPT_Name') == "EMERGENCY DEPT VISIT"), 'EMERGENCY_CARE').otherwise(0)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('ProcedureCategory').isin([ "Laboratory, General"])) & (sf.col('service_type_col') == 0), 'LAB_AND_PATHOLOGY').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('service_type_col') == 0), 'ROUTINE_RADIOLOGY').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('CPT_Code').isin(["70336"])) & (sf.col('service_type_col') == 0), 'ADVANCED_IMAGING').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('service_type_col') == 0), 'DURABLE_MEDICAL_EQUIPMENT').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('CPT_Name').isin(['CHIROPRACTIC MANIPULATION'])) & (sf.col('service_type_col') == 0), 'CHIROPRACTIC').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('service_type_col') == 0), 'AMBULANCE').otherwise(df_new.service_type_col)) df_new = df_new.withColumn('service_type_col', sf.when((sf.col('service_type_col') == 0), 'RX_MAIL').otherwise(df_new.service_type_col)) df_new.show() {code} > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.co
[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086189#comment-16086189 ] Kazuaki Ishizaki edited comment on SPARK-21393 at 7/13/17 6:39 PM: --- Thank you for uploading files. When I insert {df_new.show()} at appropriate places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2. I am reducing the number of lines of this program. was (Author: kiszk): Thank you for uploading files. When I insert {df_new.show()} at appropriate places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2. I am reducing the program. > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection
[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086189#comment-16086189 ] Kazuaki Ishizaki commented on SPARK-21393: -- Thank you for uploading files. When I insert {df_new.show()} at appropriate places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2. I am reducing the program. > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.IntWrapper wrapper25; > /* 062 */ private scala.collection.immutable.Set set28; > /* 063 */
[jira] [Updated] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21393: - Affects Version/s: 2.2.0 > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1, 2.2.0 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: Data.zip, working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.IntWrapper wrapper25; > /* 062 */ private scala.collection.immutable.Set set28; > /* 063 */ private UTF8String.IntWrapper wrapper26; > /* 064 */ private UTF8String.IntWrapper wrapper27; > /* 065 */ private scala.collection.immutable.Set set29; > /* 066 */ private scala.collection.immu
[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085939#comment-16085939 ] Kazuaki Ishizaki commented on SPARK-21391: -- I created [a PR|https://github.com/apache/spark/pull/18626] to solve this problem in Spark 2.1 > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(InternalRow i) { > /* 056 */ final InternalRow value7 = null; > /* 057 */ isNull12 = true; > /* 058 */ value12 = value7; > /* 059 */ } > /* 060 */ > /* 061 */ > /* 062 */ private void evalIfCondExpr(InternalRow i) { > /* 063 */ > /* 064 */ isNull11 = false; > /* 065 */ value11 = ExternalMapToCatalyst_value_isNull0; > /* 066 */ } > /* 067 */ > /* 068 */ > /* 069 */ private void evalIfF
[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21390: - Affects Version/s: 2.1.0 > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0, 2.2.0 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21390: - Affects Version/s: 2.2.0 > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1, 2.1.0, 2.2.0 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()
[ https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085726#comment-16085726 ] Kazuaki Ishizaki commented on SPARK-21393: -- This program seems to require 7 csv files to execute this program. Could you please attached these csv files? > spark (pyspark) crashes unpredictably when using show() or toPandas() > - > > Key: SPARK-21393 > URL: https://issues.apache.org/jira/browse/SPARK-21393 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.1.1 > Environment: Windows 10 > python 2.7 >Reporter: Zahra > Attachments: working_ST_pyspark.py > > > unpredictbly run into this error either when using > `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()` > error log starts with (truncated) : > {noformat} > 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: > org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private scala.collection.immutable.Set set; > /* 009 */ private scala.collection.immutable.Set set1; > /* 010 */ private scala.collection.immutable.Set set2; > /* 011 */ private scala.collection.immutable.Set set3; > /* 012 */ private UTF8String.IntWrapper wrapper; > /* 013 */ private UTF8String.IntWrapper wrapper1; > /* 014 */ private scala.collection.immutable.Set set4; > /* 015 */ private UTF8String.IntWrapper wrapper2; > /* 016 */ private UTF8String.IntWrapper wrapper3; > /* 017 */ private scala.collection.immutable.Set set5; > /* 018 */ private scala.collection.immutable.Set set6; > /* 019 */ private scala.collection.immutable.Set set7; > /* 020 */ private UTF8String.IntWrapper wrapper4; > /* 021 */ private UTF8String.IntWrapper wrapper5; > /* 022 */ private scala.collection.immutable.Set set8; > /* 023 */ private UTF8String.IntWrapper wrapper6; > /* 024 */ private UTF8String.IntWrapper wrapper7; > /* 025 */ private scala.collection.immutable.Set set9; > /* 026 */ private scala.collection.immutable.Set set10; > /* 027 */ private scala.collection.immutable.Set set11; > /* 028 */ private UTF8String.IntWrapper wrapper8; > /* 029 */ private UTF8String.IntWrapper wrapper9; > /* 030 */ private scala.collection.immutable.Set set12; > /* 031 */ private UTF8String.IntWrapper wrapper10; > /* 032 */ private UTF8String.IntWrapper wrapper11; > /* 033 */ private scala.collection.immutable.Set set13; > /* 034 */ private scala.collection.immutable.Set set14; > /* 035 */ private scala.collection.immutable.Set set15; > /* 036 */ private UTF8String.IntWrapper wrapper12; > /* 037 */ private UTF8String.IntWrapper wrapper13; > /* 038 */ private scala.collection.immutable.Set set16; > /* 039 */ private UTF8String.IntWrapper wrapper14; > /* 040 */ private UTF8String.IntWrapper wrapper15; > /* 041 */ private scala.collection.immutable.Set set17; > /* 042 */ private scala.collection.immutable.Set set18; > /* 043 */ private scala.collection.immutable.Set set19; > /* 044 */ private UTF8String.IntWrapper wrapper16; > /* 045 */ private UTF8String.IntWrapper wrapper17; > /* 046 */ private scala.collection.immutable.Set set20; > /* 047 */ private UTF8String.IntWrapper wrapper18; > /* 048 */ private UTF8String.IntWrapper wrapper19; > /* 049 */ private scala.collection.immutable.Set set21; > /* 050 */ private scala.collection.immutable.Set set22; > /* 051 */ private scala.collection.immutable.Set set23; > /* 052 */ private UTF8String.IntWrapper wrapper20; > /* 053 */ private UTF8String.IntWrapper wrapper21; > /* 054 */ private scala.collection.immutable.Set set24; > /* 055 */ private UTF8String.IntWrapper wrapper22; > /* 056 */ private UTF8String.IntWrapper wrapper23; > /* 057 */ private scala.collection.immutable.Set set25; > /* 058 */ private scala.collection.immutable.Set set26; > /* 059 */ private scala.collection.immutable.Set set27; > /* 060 */ private UTF8String.IntWrapper wrapper24; > /* 061 */ private UTF8String.IntWrapper wrapper25; > /* 062 */ private scala.collection.immutable.Set set28; > /* 063 */ private UTF8String.IntWrapper wrapper26; > /* 064 */ private UTF8String.In
[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085342#comment-16085342 ] Kazuaki Ishizaki commented on SPARK-21391: -- [~neelrr] Do you want to have fix in future release of 2.1? If so, I will make a PR for this backport. > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(InternalRow i) { > /* 056 */ final InternalRow value7 = null; > /* 057 */ isNull12 = true; > /* 058 */ value12 = value7; > /* 059 */ } > /* 060 */ > /* 061 */ > /* 062 */ private void evalIfCondExpr(InternalRow i) { > /* 063 */ > /* 064 */ isNull11 = false; > /* 065 */ value11 = ExternalMapToCatalyst_value_isNull0; > /* 066 */ } > /* 067 */ > /* 068 */ > /* 069 */ private void
[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085099#comment-16085099 ] Kazuaki Ishizaki edited comment on SPARK-21391 at 7/13/17 3:42 AM: --- [~hyukjin.kwon] I think that SPARK-19254 and/or SPARK-19104 fixed this issue. was (Author: kiszk): [~hyukjin.kwon] I think that [SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or [SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this issue. > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(InternalRow i) { > /* 056 */ final InternalRow value7 = null; > /* 057 */ isNull12 = true; > /* 058 */ value12 = value7; > /* 059 */ } > /* 060 */ > /* 061 */ > /
[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085099#comment-16085099 ] Kazuaki Ishizaki commented on SPARK-21391: -- [~hyukjin.kwon] I think that [SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or [SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this issue. > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(InternalRow i) { > /* 056 */ final InternalRow value7 = null; > /* 057 */ isNull12 = true; > /* 058 */ value12 = value7; > /* 059 */ } > /* 060 */ > /* 061 */ > /* 062 */ private void evalIfCondExpr(InternalRow i) { > /* 063 */ > /* 064 */ isNull11 = false; > /* 065 */ value11 = ExternalMapToCatalyst_
[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333 ] Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM: --- This program works with the master or Spark 2.2. Would it be possible to use Spark 2.2? {code} ++ | properties| ++ |Map(A1 -> [Wrappe...| |Map(A2 -> [Wrappe...| ++ {code} was (Author: kiszk): This program works with the master and Spark 2.2. Would it be possible to use Spark 2.2? {code} ++ | properties| ++ |Map(A1 -> [Wrappe...| |Map(A2 -> [Wrappe...| ++ {code} > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 0
[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333 ] Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM: --- This program works with the master and Spark 2.2. Would it be possible to use Spark 2.2? {code} ++ | properties| ++ |Map(A1 -> [Wrappe...| |Map(A2 -> [Wrappe...| ++ {code} was (Author: kiszk): This program works with the master. {code} ++ | properties| ++ |Map(A1 -> [Wrappe...| |Map(A2 -> [Wrappe...| ++ {code} > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(Inter
[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset
[ https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333 ] Kazuaki Ishizaki commented on SPARK-21391: -- This program works with the master. {code} ++ | properties| ++ |Map(A1 -> [Wrappe...| |Map(A2 -> [Wrappe...| ++ {code} > Cannot convert a Seq of Map whose value type is again a seq, into a dataset > > > Key: SPARK-21391 > URL: https://issues.apache.org/jira/browse/SPARK-21391 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Seen on mac OSX, scala 2.11, java 8 >Reporter: indraneel rao > > There is an error while trying to create a dataset from a sequence of Maps, > whose values have any kind of collections. Even when they are wrapped in a > case class. > Eg : The following piece of code throws an error: > > {code:java} > case class Values(values: Seq[Double]) > case class ItemProperties(properties:Map[String,Values]) > val l1 = List(ItemProperties( > Map( > "A1" -> Values(Seq(1.0,2.0)), > "B1" -> Values(Seq(44.0,55.0)) > ) > ), > ItemProperties( > Map( > "A2" -> Values(Seq(123.0,25.0)), > "B2" -> Values(Seq(445.0,35.0)) > ) > ) > ) > l1.toDS().show() > {code} > Heres the error: > 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an > rvalue > /* 001 */ public java.lang.Object generate(Object[] references) { > /* 002 */ return new SpecificUnsafeProjection(references); > /* 003 */ } > /* 004 */ > /* 005 */ class SpecificUnsafeProjection extends > org.apache.spark.sql.catalyst.expressions.UnsafeProjection { > /* 006 */ > /* 007 */ private Object[] references; > /* 008 */ private boolean resultIsNull; > /* 009 */ private java.lang.String argValue; > /* 010 */ private Object[] values; > /* 011 */ private boolean resultIsNull1; > /* 012 */ private scala.collection.Seq argValue1; > /* 013 */ private boolean isNull11; > /* 014 */ private boolean value11; > /* 015 */ private boolean isNull12; > /* 016 */ private InternalRow value12; > /* 017 */ private boolean isNull13; > /* 018 */ private InternalRow value13; > /* 019 */ private UnsafeRow result; > /* 020 */ private > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder; > /* 021 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter; > /* 022 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter; > /* 023 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter1; > /* 024 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1; > /* 025 */ private > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter > arrayWriter2; > /* 026 */ > /* 027 */ public SpecificUnsafeProjection(Object[] references) { > /* 028 */ this.references = references; > /* 029 */ > /* 030 */ > /* 031 */ this.values = null; > /* 032 */ > /* 033 */ > /* 034 */ isNull11 = false; > /* 035 */ value11 = false; > /* 036 */ isNull12 = false; > /* 037 */ value12 = null; > /* 038 */ isNull13 = false; > /* 039 */ value13 = null; > /* 040 */ result = new UnsafeRow(1); > /* 041 */ this.holder = new > org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32); > /* 042 */ this.rowWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 043 */ this.arrayWriter = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 044 */ this.arrayWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 045 */ this.rowWriter1 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1); > /* 046 */ this.arrayWriter2 = new > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter(); > /* 047 */ > /* 048 */ } > /* 049 */ > /* 050 */ public void initialize(int partitionIndex) { > /* 051 */ > /* 052 */ } > /* 053 */ > /* 054 */ > /* 055 */ private void evalIfTrueExpr(InternalRow i) { > /* 056 */ final InternalRow value7 = null; > /* 057 */ isNull12 = true; > /* 058 */ value12 = value7; > /* 059 */ } > /* 060 */ > /* 061 */ > /* 062 */ private void evalIfCondExpr(InternalRow i) { > /* 063 */ > /* 064 */ isNull11 = false; > /* 065 */ value11 = ExternalMapToCata
[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084306#comment-16084306 ] Kazuaki Ishizaki commented on SPARK-21390: -- Another interesting results with Spark-2.2: On IDE {code:java} { ... filterMe1.filter(x=> filterCondition.contains(x)).show filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).show } +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ {code} On REPL {code:java} ... scala> filterMe1.filter(x => filterCondition.contains(x)).show +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, x.field2))).show +--+--+ |field1|field2| +--+--+ +--+--+ scala> print(filterCondition.contains(SomeClass("00", "01"))) true scala> filterMe1.filter(x => { val c = filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} ).show false +--+--+ |field1|field2| +--+--+ +--+--+ {code} > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084306#comment-16084306 ] Kazuaki Ishizaki edited comment on SPARK-21390 at 7/12/17 5:09 PM: --- Another interesting results with Spark-2.2. Is this only for CaseClass on REPL? On IDE {code:java} { ... filterMe1.filter(x=> filterCondition.contains(x)).show filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).show } +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ {code} On REPL {code:java} ... scala> filterMe1.filter(x => filterCondition.contains(x)).show +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, x.field2))).show +--+--+ |field1|field2| +--+--+ +--+--+ scala> print(filterCondition.contains(SomeClass("00", "01"))) true scala> filterMe1.filter(x => { val c = filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} ).show false +--+--+ |field1|field2| +--+--+ +--+--+ scala> Seq((0, 0), (1, 1), (2, 2)).toDS.filter(x => { val c = Seq((1, 1)).contains((x._1, x._2)); print(s"$c\n"); c} ).show false true false +---+---+ | _1| _2| +---+---+ | 1| 1| +---+---+ {code} was (Author: kiszk): Another interesting results with Spark-2.2: On IDE {code:java} { ... filterMe1.filter(x=> filterCondition.contains(x)).show filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).show } +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ {code} On REPL {code:java} ... scala> filterMe1.filter(x => filterCondition.contains(x)).show +--+--+ |field1|field2| +--+--+ |00|01| +--+--+ scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, x.field2))).show +--+--+ |field1|field2| +--+--+ +--+--+ scala> print(filterCondition.contains(SomeClass("00", "01"))) true scala> filterMe1.filter(x => { val c = filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} ).show false +--+--+ |field1|field2| +--+--+ +--+--+ {code} > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency
[ https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084266#comment-16084266 ] Kazuaki Ishizaki commented on SPARK-21390: -- Thank you for reporting this. I can reproduce this using Spark 2.2, too. {code:java} __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.sql.Dataset import org.apache.spark.sql.Dataset scala> case class SomeClass(field1:String, field2:String) defined class SomeClass scala> val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) filterCondition: Seq[SomeClass] = List(SomeClass(00,01)) scala> val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS filterMe1: org.apache.spark.sql.Dataset[SomeClass] = [field1: string, field2: string] scala> println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) Works fine!1 scala> case class OtherClass(field1:String, field2:String) defined class OtherClass scala> val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS filterMe2: org.apache.spark.sql.Dataset[OtherClass] = [field1: string, field2: string] scala> println("Fail, count should return 1: " + filterMe2.filter(x=> filterCondition.contains(SomeClass(x.field1, x.field2))).count) Fail, count should return 1: 0 {code} > Dataset filter api inconsistency > > > Key: SPARK-21390 > URL: https://issues.apache.org/jira/browse/SPARK-21390 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Gheorghe Gheorghe >Priority: Minor > > Hello everybody, > I've encountered a strange situation with the spark-shell. > When I run the code below in my IDE the second test case prints as expected > count "1". However, when I run the same code using the spark-shell in the > second test case I get 0 back as a count. > I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE > and spark-shell. > {code:java} > import org.apache.spark.sql.Dataset > case class SomeClass(field1:String, field2:String) > val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") ) > // Test 1 > val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS > > println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count) > > // Test 2 > case class OtherClass(field1:String, field2:String) > > val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS > println("Fail, count should return 1: " + filterMe2.filter(x=> > filterCondition.contains(SomeClass(x.field1, x.field2))).count) > {code} > Note if I transform the dataset first I get 1 back as expected. > {code:java} > println(filterMe2.map(x=> SomeClass(x.field1, > x.field2)).filter(filterCondition.contains(_)).count) > {code} > Is this a bug? I can see that this filter function has been marked as > experimental > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
Kazuaki Ishizaki created SPARK-21387: Summary: org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM Key: SPARK-21387 URL: https://issues.apache.org/jira/browse/SPARK-21387 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.3.0 Reporter: Kazuaki Ishizaki -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-21373) Update Jetty to 9.3.20.v20170531
[ https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082426#comment-16082426 ] Kazuaki Ishizaki edited comment on SPARK-21373 at 7/11/17 3:56 PM: --- Since I have not clarified, I changed the title. was (Author: kiszk): Since I have not clarified, I changed the title. I will submit a PR for improvement. > Update Jetty to 9.3.20.v20170531 > > > Key: SPARK-21373 > URL: https://issues.apache.org/jira/browse/SPARK-21373 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > This is derived from https://issues.apache.org/jira/browse/FELIX-5664. > [~aroberts] let me know the CVE. > Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 > * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 > * https://github.com/eclipse/jetty.project/issues/1556 > We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531
[ https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21373: - Summary: Update Jetty to 9.3.20.v20170531 (was: Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735) > Update Jetty to 9.3.20.v20170531 > > > Key: SPARK-21373 > URL: https://issues.apache.org/jira/browse/SPARK-21373 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > This is derived from https://issues.apache.org/jira/browse/FELIX-5664. > [~aroberts] let me know the CVE. > Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 > * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 > * https://github.com/eclipse/jetty.project/issues/1556 > We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21373) Update Jetty to 9.3.20.v20170531
[ https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082426#comment-16082426 ] Kazuaki Ishizaki commented on SPARK-21373: -- Since I have not clarified, I changed the title. I will submit a PR for improvement. > Update Jetty to 9.3.20.v20170531 > > > Key: SPARK-21373 > URL: https://issues.apache.org/jira/browse/SPARK-21373 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Kazuaki Ishizaki >Priority: Minor > > This is derived from https://issues.apache.org/jira/browse/FELIX-5664. > [~aroberts] let me know the CVE. > Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 > * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 > * https://github.com/eclipse/jetty.project/issues/1556 > We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
Kazuaki Ishizaki created SPARK-21373: Summary: Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735 Key: SPARK-21373 URL: https://issues.apache.org/jira/browse/SPARK-21373 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.1, 2.3.0 Reporter: Kazuaki Ishizaki This is derived from https://issues.apache.org/jira/browse/FELIX-5664 Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 * https://github.com/eclipse/jetty.project/issues/1556 We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
[ https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-21373: - Description: This is derived from https://issues.apache.org/jira/browse/FELIX-5664. [~aroberts] let me know the CVE. Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 * https://github.com/eclipse/jetty.project/issues/1556 We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. was: This is derived from https://issues.apache.org/jira/browse/FELIX-5664 Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 * https://github.com/eclipse/jetty.project/issues/1556 We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. > Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735 > - > > Key: SPARK-21373 > URL: https://issues.apache.org/jira/browse/SPARK-21373 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.1, 2.3.0 >Reporter: Kazuaki Ishizaki > > This is derived from https://issues.apache.org/jira/browse/FELIX-5664. > [~aroberts] let me know the CVE. > Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735 > * https://nvd.nist.gov/vuln/detail/CVE-2017-9735 > * https://github.com/eclipse/jetty.project/issues/1556 > We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21364) IndexOutOfBoundsException on equality check of two complex array elements
[ https://issues.apache.org/jira/browse/SPARK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080963#comment-16080963 ] Kazuaki Ishizaki commented on SPARK-21364: -- When I ran the following test case that is derived from the repro, I succeeded to get the result without any exception on the master or 2.1.1. Do I make some mistakes? {code} test("SPARK-21364") { val data = Seq( "{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[" + "{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"}," + "{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"}, " + "{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}" + "]}}}") val df = sqlContext.read.json(sparkContext.parallelize(data)) df.select($"menu.popup.menuitem"(lit(0)). === ($"menu.popup.menuitem"(lit(1.show } {code} {code} +-+ |(menu.popup.menuitem[0] = menu.popup.menuitem[1])| +-+ |false| +-+ {code} > IndexOutOfBoundsException on equality check of two complex array elements > - > > Key: SPARK-21364 > URL: https://issues.apache.org/jira/browse/SPARK-21364 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 >Reporter: Vivek Patangiwar >Priority: Minor > > Getting an IndexOutOfBoundsException with the following code: > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.SparkSession > object ArrayEqualityTest { > def main(s:Array[String]) { > val sparkSession = > SparkSession.builder().master("local[*]").appName("app").getOrCreate() > val sqlContext = sparkSession.sqlContext > val sc = sparkSession.sqlContext.sparkContext > import sparkSession.implicits._ > val df = > sqlContext.read.json(sc.parallelize(Seq("{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"},{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"},{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}]}}}"))) > > df.select($"menu.popup.menuitem"(lit(0)).===($"menu.popup.menuitem"(lit(1.show > } > } > Here's the complete stack-trace: > Exception in thread "main" java.lang.IndexOutOfBoundsException: 1 > at > scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65) > at scala.collection.immutable.List.apply(List.scala:84) > at > org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104) > at > org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:76) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:75) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:75) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:68) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.genComp(CodeGenerator.scala:559) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.genEqual(CodeGenerator.scala:486) > at > org.apache.spark.sql.catalyst.expressions.EqualTo$$anonfun$doGenCode$4.apply(predicates.scala:437) > at > org.apache.spark.sql.catalyst.expressions.EqualTo$$anonfun$doGenCode$4.apply(predicates.scala:437) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression$$anonfun$defineCodeGen$2.apply(Expression.scala:442) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression$$anonfun$defineCodeGen$2.apply(Expression.scala:441) > at > org.apache.spark.sql.catalyst.ex
[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB
[ https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079032#comment-16079032 ] Kazuaki Ishizaki commented on SPARK-21337: -- I cannot reproduce this by using the latest or v2.1 tag in branch-2.1, too. Is this issue only for CDH? > SQL which has large ‘case when’ expressions may cause code generation beyond > 64KB > - > > Key: SPARK-21337 > URL: https://issues.apache.org/jira/browse/SPARK-21337 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2 >Reporter: fengchaoge > Fix For: 2.1.1 > > > when there are large 'case when ' expressions in spark sql,the CodeGenerator > failed to compile it. > Error message is followed by a huge dump of generated source code,at last > failed. > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB. > It seems that SPARK-13242 has solved this problem in spark-1.6.2,however it > apparence in spark-2.1.1 again. > https://issues.apache.org/jira/browse/SPARK-13242. > is there something wrong ? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21344) BinaryType comparison does signed byte array comparison
[ https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079010#comment-16079010 ] Kazuaki Ishizaki commented on SPARK-21344: -- I will work for this if anyone has finished a PR. > BinaryType comparison does signed byte array comparison > --- > > Key: SPARK-21344 > URL: https://issues.apache.org/jira/browse/SPARK-21344 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.1.1 >Reporter: Shubham Chopra > > BinaryType used by Spark SQL defines ordering using signed byte comparisons. > This can lead to unexpected behavior. Consider the following code snippet > that shows this error: > {code} > case class TestRecord(col0: Array[Byte]) > def convertToBytes(i: Long): Array[Byte] = { > val bb = java.nio.ByteBuffer.allocate(8) > bb.putLong(i) > bb.array > } > def test = { > val sql = spark.sqlContext > import sql.implicits._ > val timestamp = 1498772083037L > val data = (timestamp to timestamp + 1000L).map(i => > TestRecord(convertToBytes(i))) > val testDF = sc.parallelize(data).toDF > val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 50L)) > val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + > 50L) && col("col0") < convertToBytes(timestamp + 100L)) > val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && > col("col0") < convertToBytes(timestamp + 100L)) > assert(filter1.count == 50) > assert(filter2.count == 50) > assert(filter3.count == 100) > } > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB
[ https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077952#comment-16077952 ] Kazuaki Ishizaki commented on SPARK-21337: -- In the master branch, I cannot see a huge dump and did not get a failure. Should we have to backport a fix into 2.1.1 if I am correct? {code} test("split complex single column expressions") { val cases = 50 val conditionClauses = 20 // Generate an individual case def generateCase(n: Int): (Expression, Expression) = { val condition = (1 to conditionClauses) .map(c => EqualTo(BoundReference(0, StringType, false), Literal(s"$c:$n"))) .reduceLeft[Expression]((l, r) => Or(l, r)) (condition, Literal(n)) } val expression = CaseWhen((1 to cases).map(generateCase(_))) // Currently this throws a java.util.concurrent.ExecutionException wrapping a // org.codehaus.janino.JaninoRuntimeException: Code of method XXX of class YYY grows beyond 64 KB val plan = GenerateMutableProjection.generate(Seq(expression)) } {code} > SQL which has large ‘case when’ expressions may cause code generation beyond > 64KB > - > > Key: SPARK-21337 > URL: https://issues.apache.org/jira/browse/SPARK-21337 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.1 > Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2 >Reporter: fengchaoge > Fix For: 2.1.1 > > > when there are large 'case when ' expressions in spark sql,the CodeGenerator > failed to compile it. > Error message is followed by a huge dump of generated source code,at last > failed. > java.util.concurrent.ExecutionException: java.lang.Exception: failed to > compile: org.codehaus.janino.JaninoRuntimeException: Code of method > "apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V" > of class > "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" > grows beyond 64 KB. > It seems like SPARK-13242 has solved this problem in spark-1.6.1,however it > apparence in spark-2.1.1 again. > https://issues.apache.org/jira/browse/SPARK-13242. > is there something wrong ? -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org