from:"Kazuaki Ishizaki $Jira$"

[jira] [Created] (SPARK-22246) UnsafeRow, UnsafeArrayData, and UnsafeMapData use MemoryBlock

2017-10-11 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-22246:


 Summary: UnsafeRow, UnsafeArrayData, and UnsafeMapData use 
MemoryBlock
 Key: SPARK-22246
 URL: https://issues.apache.org/jira/browse/SPARK-22246
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki


To use {{MemoryBlock}} can improve flexibility of choosing memory type and 
runtime performance for memory accesses with {{Unsafe}}.
This JIRA entry tries to use {{MemoryBlock}} in {{UnsafeRow}}, 
{{UnsafeArrayData}}, and {{UnsafeMapData}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22226) splitExpression can create too many method calls (generating a Constant Pool limit error)

2017-10-09 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197272#comment-16197272
 ] 

Kazuaki Ishizaki commented on SPARK-6:
--

You are right. [This PR|https://github.com/apache/spark/pull/16648] will not 
solve the issue regarding a lot of splited methods. I missed the discussion we 
did [here|https://github.com/apache/spark/pull/19447].

> splitExpression can create too many method calls (generating a Constant Pool 
> limit error)
> -
>
> Key: SPARK-6
> URL: https://issues.apache.org/jira/browse/SPARK-6
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Marco Gaido
>
> Code generation for very wide datasets can fail because of the Constant Pool 
> limit reached.
> This can be caused by many reasons. One of them is that we are currently 
> splitting the definition of the generated methods among several 
> {{NestedClass}} but all these methods are called in the main class. Since we 
> have entries added to the constant pool for each method invocation, this is 
> limiting the number of rows and is leading for very wide dataset to:
> {noformat}
> org.codehaus.janino.JaninoRuntimeException: Constant pool for class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection
>  has grown past JVM limit of 0x
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22226) Code generation fails for dataframes with 10000 columns

2017-10-09 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197164#comment-16197164
 ] 

Kazuaki Ishizaki commented on SPARK-6:
--

[This PR|https://github.com/apache/spark/pull/16648] addresses such an issue.

> Code generation fails for dataframes with 1 columns
> ---
>
> Key: SPARK-6
> URL: https://issues.apache.org/jira/browse/SPARK-6
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Marco Gaido
>
> Code generation for very wide datasets can fail because of the Constant Pool 
> limit reached.
> This can be caused by many reasons. One of them is that we are currently 
> splitting the definition of the generated methods among several 
> {{NestedClass}} but all these methods are called in the main class. Since we 
> have entries added to the constant pool for each method invocation, this is 
> limiting the number of rows and is leading for very wide dataset to:
> {noformat}
> org.codehaus.janino.JaninoRuntimeException: Constant pool for class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection
>  has grown past JVM limit of 0x
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-22219) Refector "spark.sql.codegen.comments"

2017-10-06 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-22219:


 Summary: Refector "spark.sql.codegen.comments"
 Key: SPARK-22219
 URL: https://issues.apache.org/jira/browse/SPARK-22219
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki
Priority: Minor


To get a value for {{"spark.sql.codegen.comments"}} is not the latest approach. 
This refactoring is to use better approach to get a value for 
{{"spark.sql.codegen.comments"}} .



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-10-06 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194937#comment-16194937
 ] 

Kazuaki Ishizaki commented on SPARK-19984:
--

[~JohnSteidley] Here is my program that produced the above plan. Any comments 
are appreciated to reproduce the problem.

{code}
  test("SPARK-19984") {
 withSQLConf(
   "spark.sql.shuffle.partitions" -> "1",
   "spark.sql.join.preferSortMergeJoin" -> "true",
   "spark.sql.autoBroadcastJoinThreshold" -> "-1")
{
  withTempPath { dir =>
val t = sparkContext.parallelize(Seq("data1", "data2", 
"data3")).toDF("id")
t.write.parquet(dir.getCanonicalPath)

val df3 = sqlContext.read.parquet(dir.getCanonicalPath)

val dfA = df3.limit(3)
val dfB = dfA.select(col("id").alias("A"), col("id").alias("B"))
val dfC = df3.select(col("id").alias("A"))
val df1 = dfC.join(dfB, "A")
val df = df1.groupBy().count()
df.explain(true)
df.collect
  }
}
  }
{code}

> ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-19984
> URL: https://issues.apache.org/jira/browse/SPARK-19984
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Andrey Yakovenko
> Attachments: after_adding_count.txt, before_adding_count.txt
>
>
> I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. 
> This is not permanent error, next time i run it could disappear. 
> Unfortunately i don't know how to reproduce the issue.  As you can see from 
> the log my logic is pretty complicated.
> Here is a part of log i've got (container_1489514660953_0015_01_01)
> {code}
> 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 151, Column 29: A method named "compare" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIterator(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIterator extends 
> org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private scala.collection.Iterator[] inputs;
> /* 008 */   private boolean agg_initAgg;
> /* 009 */   private boolean agg_bufIsNull;
> /* 010 */   private long agg_bufValue;
> /* 011 */   private boolean agg_initAgg1;
> /* 012 */   private boolean agg_bufIsNull1;
> /* 013 */   private long agg_bufValue1;
> /* 014 */   private scala.collection.Iterator smj_leftInput;
> /* 015 */   private scala.collection.Iterator smj_rightInput;
> /* 016 */   private InternalRow smj_leftRow;
> /* 017 */   private InternalRow smj_rightRow;
> /* 018 */   private UTF8String smj_value2;
> /* 019 */   private java.util.ArrayList smj_matches;
> /* 020 */   private UTF8String smj_value3;
> /* 021 */   private UTF8String smj_value4;
> /* 022 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> smj_numOutputRows;
> /* 023 */   private UnsafeRow smj_result;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> smj_rowWriter;
> /* 026 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows;
> /* 027 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime;
> /* 028 */   private UnsafeRow agg_result;
> /* 029 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
> /* 030 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter;
> /* 031 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows1;
> /* 032 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime1;
> /* 033 */   private UnsafeRow agg_result1;
> /* 034 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder1;
> /* 035 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter1;
> /* 036 */
> /* 037 */   public GeneratedIterator(Object[] references) {
> /* 038 */ this.references = references;
> /* 039 */   }
> /* 040 */
> /* 041 */   public void init(int index, scala.collection.Iterator[] inputs) {
> /* 042 */ partitionIndex = index;
> /* 043 */ this.inputs = inputs;
> /* 044 */ wholestagecodegen_init_0();
> /* 045 */ wholestagecodegen_init_1();
> /* 046 */
> /* 047 */   }
> /* 04

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-10-05 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16193201#comment-16193201
 ] 

Kazuaki Ishizaki commented on SPARK-19984:
--

[~JohnSteidley] Thank you for your comment. Now, I updated my program. Then, I 
got the same physical plan as what you provided. However, my program works on 
Spark 2.1.2 RC and master branch.
While the physical plan tried to {{SortMergeJoin}} two strings, the generated 
code seems to {{SortMergeJoin}} string and long value for {{count(1)}}. That 
strange {{SortMergeJoin}} is what I cannot understand.

{code}
== Physical Plan ==
*HashAggregate(keys=[], functions=[count(1)], output=[count#35L])
+- *HashAggregate(keys=[], functions=[partial_count(1)], output=[count#40L])
   +- *Project
  +- *SortMergeJoin [A#25], [A#20], Inner
 :- *Sort [A#25 ASC NULLS FIRST], false, 0
 :  +- Exchange hashpartitioning(A#25, 1)
 : +- *Project [id#16 AS A#25]
 :+- *Filter isnotnull(id#16)
 :   +- *FileScan parquet [id#16] Batched: true, Format: 
Parquet, Location: InMemoryFileIndex[file:..., PartitionFilters: [], 
PushedFilters: [IsNotNull(id)], ReadSchema: struct
 +- *Sort [A#20 ASC NULLS FIRST], false, 0
+- *Project [id#16 AS A#20]
   +- *Filter isnotnull(id#16)
  +- *GlobalLimit 3
 +- Exchange SinglePartition
+- *LocalLimit 3
   +- *FileScan parquet [id#16] Batched: true, Format: 
Parquet, Location: InMemoryFileIndex[file:..., PartitionFilters: [], 
PushedFilters: [], ReadSchema: struct
{code}

> ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-19984
> URL: https://issues.apache.org/jira/browse/SPARK-19984
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Andrey Yakovenko
> Attachments: after_adding_count.txt, before_adding_count.txt
>
>
> I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. 
> This is not permanent error, next time i run it could disappear. 
> Unfortunately i don't know how to reproduce the issue.  As you can see from 
> the log my logic is pretty complicated.
> Here is a part of log i've got (container_1489514660953_0015_01_01)
> {code}
> 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 151, Column 29: A method named "compare" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIterator(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIterator extends 
> org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private scala.collection.Iterator[] inputs;
> /* 008 */   private boolean agg_initAgg;
> /* 009 */   private boolean agg_bufIsNull;
> /* 010 */   private long agg_bufValue;
> /* 011 */   private boolean agg_initAgg1;
> /* 012 */   private boolean agg_bufIsNull1;
> /* 013 */   private long agg_bufValue1;
> /* 014 */   private scala.collection.Iterator smj_leftInput;
> /* 015 */   private scala.collection.Iterator smj_rightInput;
> /* 016 */   private InternalRow smj_leftRow;
> /* 017 */   private InternalRow smj_rightRow;
> /* 018 */   private UTF8String smj_value2;
> /* 019 */   private java.util.ArrayList smj_matches;
> /* 020 */   private UTF8String smj_value3;
> /* 021 */   private UTF8String smj_value4;
> /* 022 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> smj_numOutputRows;
> /* 023 */   private UnsafeRow smj_result;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> smj_rowWriter;
> /* 026 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows;
> /* 027 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime;
> /* 028 */   private UnsafeRow agg_result;
> /* 029 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
> /* 030 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter;
> /* 031 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows1;
> /* 032 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime1;
> /* 033 */   private UnsafeRow agg_result1;
> /* 034 */   private

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-10-03 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190050#comment-16190050
 ] 

Kazuaki Ishizaki commented on SPARK-19984:
--

[~JohnSteidley] Thank you for providing valuable information.
I succeeded a small program derived from your example that has the following 
physical plan. I think that this physical plan is almost the same as you 
provided. Only the difference looks whether parquet is used or not.
However, I cannot reproduce this program using the small program using Spark 
2.1 or master branch. Tomorrow, I will try to use parquet in the small program.

{code}
== Physical Plan ==
*HashAggregate(keys=[], functions=[count(A#52)], output=[count(A)#63L])
+- *HashAggregate(keys=[], functions=[partial_count(A#52)], output=[count#68L])
   +- *Project [A#52]
  +- *SortMergeJoin [A#52], [A#43], Inner
 :- *Sort [A#52 ASC NULLS FIRST], false, 0
 :  +- Exchange hashpartitioning(A#52, 1)
 : +- *Project [value#50 AS A#52]
 :+- *Filter isnotnull(value#50)
 :   +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#50]
 :  +- Scan ExternalRDDScan[obj#49]
 +- *Sort [A#43 ASC NULLS FIRST], false, 0
+- *Project [id#39 AS A#43]
   +- *Filter isnotnull(id#39)
  +- *GlobalLimit 2
 +- Exchange SinglePartition
+- *LocalLimit 2
   +- *Project [value#37 AS id#39]
  +- *SerializeFromObject [staticinvoke(class 
org.apache.spark.unsafe.types.UTF8String, StringType, fromString, input[0, 
java.lang.String, true], true) AS value#37]
 +- Scan ExternalRDDScan[obj#36]
{code}

> ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-19984
> URL: https://issues.apache.org/jira/browse/SPARK-19984
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Andrey Yakovenko
> Attachments: after_adding_count.txt, before_adding_count.txt
>
>
> I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. 
> This is not permanent error, next time i run it could disappear. 
> Unfortunately i don't know how to reproduce the issue.  As you can see from 
> the log my logic is pretty complicated.
> Here is a part of log i've got (container_1489514660953_0015_01_01)
> {code}
> 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 151, Column 29: A method named "compare" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIterator(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIterator extends 
> org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private scala.collection.Iterator[] inputs;
> /* 008 */   private boolean agg_initAgg;
> /* 009 */   private boolean agg_bufIsNull;
> /* 010 */   private long agg_bufValue;
> /* 011 */   private boolean agg_initAgg1;
> /* 012 */   private boolean agg_bufIsNull1;
> /* 013 */   private long agg_bufValue1;
> /* 014 */   private scala.collection.Iterator smj_leftInput;
> /* 015 */   private scala.collection.Iterator smj_rightInput;
> /* 016 */   private InternalRow smj_leftRow;
> /* 017 */   private InternalRow smj_rightRow;
> /* 018 */   private UTF8String smj_value2;
> /* 019 */   private java.util.ArrayList smj_matches;
> /* 020 */   private UTF8String smj_value3;
> /* 021 */   private UTF8String smj_value4;
> /* 022 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> smj_numOutputRows;
> /* 023 */   private UnsafeRow smj_result;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> smj_rowWriter;
> /* 026 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows;
> /* 027 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime;
> /* 028 */   private UnsafeRow agg_result;
> /* 029 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
> /* 030 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter;
> /* 031 */

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-10-02 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16188320#comment-16188320
 ] 

Kazuaki Ishizaki commented on SPARK-19984:
--

[~JohnSteidley] Thank you for your report. I also confirmed your code stuff 
cannot reproduce this issue. It is still hard to fix this issue.
Can you put the result of {{df.explain(true)}} of the result of {{join}} if you 
cannot share the whole codebase?

> ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java'
> -
>
> Key: SPARK-19984
> URL: https://issues.apache.org/jira/browse/SPARK-19984
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0
>Reporter: Andrey Yakovenko
>
> I had this error few time on my local hadoop 2.7.3+Spark2.1.0 environment. 
> This is not permanent error, next time i run it could disappear. 
> Unfortunately i don't know how to reproduce the issue.  As you can see from 
> the log my logic is pretty complicated.
> Here is a part of log i've got (container_1489514660953_0015_01_01)
> {code}
> 17/03/16 11:07:04 ERROR codegen.CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 151, Column 29: A method named "compare" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 001 */ public Object generate(Object[] references) {
> /* 002 */   return new GeneratedIterator(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ final class GeneratedIterator extends 
> org.apache.spark.sql.execution.BufferedRowIterator {
> /* 006 */   private Object[] references;
> /* 007 */   private scala.collection.Iterator[] inputs;
> /* 008 */   private boolean agg_initAgg;
> /* 009 */   private boolean agg_bufIsNull;
> /* 010 */   private long agg_bufValue;
> /* 011 */   private boolean agg_initAgg1;
> /* 012 */   private boolean agg_bufIsNull1;
> /* 013 */   private long agg_bufValue1;
> /* 014 */   private scala.collection.Iterator smj_leftInput;
> /* 015 */   private scala.collection.Iterator smj_rightInput;
> /* 016 */   private InternalRow smj_leftRow;
> /* 017 */   private InternalRow smj_rightRow;
> /* 018 */   private UTF8String smj_value2;
> /* 019 */   private java.util.ArrayList smj_matches;
> /* 020 */   private UTF8String smj_value3;
> /* 021 */   private UTF8String smj_value4;
> /* 022 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> smj_numOutputRows;
> /* 023 */   private UnsafeRow smj_result;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder smj_holder;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> smj_rowWriter;
> /* 026 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows;
> /* 027 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime;
> /* 028 */   private UnsafeRow agg_result;
> /* 029 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder;
> /* 030 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter;
> /* 031 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_numOutputRows1;
> /* 032 */   private org.apache.spark.sql.execution.metric.SQLMetric 
> agg_aggTime1;
> /* 033 */   private UnsafeRow agg_result1;
> /* 034 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder agg_holder1;
> /* 035 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
> agg_rowWriter1;
> /* 036 */
> /* 037 */   public GeneratedIterator(Object[] references) {
> /* 038 */ this.references = references;
> /* 039 */   }
> /* 040 */
> /* 041 */   public void init(int index, scala.collection.Iterator[] inputs) {
> /* 042 */ partitionIndex = index;
> /* 043 */ this.inputs = inputs;
> /* 044 */ wholestagecodegen_init_0();
> /* 045 */ wholestagecodegen_init_1();
> /* 046 */
> /* 047 */   }
> /* 048 */
> /* 049 */   private void wholestagecodegen_init_0() {
> /* 050 */ agg_initAgg = false;
> /* 051 */
> /* 052 */ agg_initAgg1 = false;
> /* 053 */
> /* 054 */ smj_leftInput = inputs[0];
> /* 055 */ smj_rightInput = inputs[1];
> /* 056 */
> /* 057 */ smj_rightRow = null;
> /* 058 */
> /* 059 */ smj_matches = new java.util.ArrayList();
> /* 060 */
> /* 061 */ this.smj_numOutputRows = 
> (org.apache.spark.sql.execution.metric.SQLMetric) references[0];
> /* 062 */ smj_result = new UnsafeRow(2);
> /* 063 */ this.smj_holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(smj_result, 
> 64);
> /* 064 */ th

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-09-28 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16184206#comment-16184206
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

Thank you for reporting this again.
While I pinged the original author in [this 
PR|https://github.com/apache/spark/pull/16648], it will not happen yet.

> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$AbstractPackageMemberClassDeclaration.accept(Java.java:1309)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at org.codehaus

[jira] [Updated] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-22130:
-
Issue Type: Improvement  (was: Bug)

> UTF8String.trim() inefficiently scans all white-space string twice.
> ---
>
> Key: SPARK-22130
> URL: https://issues.apache.org/jira/browse/SPARK-22130
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> {{UTF8String.trim()}} scans a string including only white space (e.g. {{"
> "}}) twice inefficiently. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181190#comment-16181190
 ] 

Kazuaki Ishizaki commented on SPARK-16845:
--

[~mvelusce] Thank you for reporting an issue with repro. I can reproduce this.

If I am correct, Spark 2.2 can fall back into a path disabling code gen by 
[this PR|https://github.com/apache/spark/pull/17087]. Once we tried to backport 
this to Spark 2.1, it was rejected.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> -
>
> Key: SPARK-16845
> URL: https://issues.apache.org/jira/browse/SPARK-16845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: hejie
>Assignee: Liwei Lin
> Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
> Attachments: error.txt.zip
>
>
> I have a wide table(400 columns), when I try fitting the traindata on all 
> columns,  the fatal error occurs. 
>   ... 46 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
>   at org.codehaus.janino.CodeContext.write(CodeContext.java:854)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181070#comment-16181070
 ] 

Kazuaki Ishizaki commented on SPARK-22130:
--

I will submit a PR soon.

> UTF8String.trim() inefficiently scans all white-space string twice.
> ---
>
> Key: SPARK-22130
> URL: https://issues.apache.org/jira/browse/SPARK-22130
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> {{UTF8String.trim()}} scans a string including only white space (e.g. {{"
> "}}) twice inefficiently. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-22130:


 Summary: UTF8String.trim() inefficiently scans all white-space 
string twice.
 Key: SPARK-22130
 URL: https://issues.apache.org/jira/browse/SPARK-22130
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki
Priority: Minor


{{UTF8String.trim()}} scans a string including only white space (e.g. {{"
"}}) twice inefficiently. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen

2017-09-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176655#comment-16176655
 ] 

Kazuaki Ishizaki edited comment on SPARK-22105 at 9/22/17 4:22 PM:
---

Can these PRs at https://issues.apache.org/jira/browse/SPARK-21870 and 
https://issues.apache.org/jira/browse/SPARK-21871 alleviate this issue?


was (Author: kiszk):
Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this 
issue?

> Dataframe has poor performance when computing on many columns with codegen
> --
>
> Key: SPARK-22105
> URL: https://issues.apache.org/jira/browse/SPARK-22105
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SQL
>Affects Versions: 2.3.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Suppose we have a dataframe with many columns (e.g 100 columns), each column 
> is DoubleType.
> And we need to compute avg on each column. We will find using dataframe avg 
> will be much slower than using RDD.aggregate.
> I observe this issue from this PR: (One pass imputer)
> https://github.com/apache/spark/pull/18902
> I also write a minimal testing code to reproduce this issue, I use computing 
> sum to reproduce this issue:
> https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1
> When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be 
> about 3x slower than `RDD.aggregate`, but if we only compute one column, 
> dataframe avg will be much faster than `RDD.aggregate`.
> The reason of this issue, should be the defact in dataframe codegen. Codegen 
> will inline everything and generate large code block. When the column number 
> is large (e.g 100 columns), the codegen size will be too large, which cause 
> jvm failed to JIT and fall back to byte code interpretation.
> This PR should address this issue:
> https://github.com/apache/spark/pull/19082
> But we need more performance test against some code in ML after above PR 
> merged, to check whether this issue is actually fixed.
> This JIRA used to track this performance issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen

2017-09-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16176655#comment-16176655
 ] 

Kazuaki Ishizaki commented on SPARK-22105:
--

Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this 
issue?

> Dataframe has poor performance when computing on many columns with codegen
> --
>
> Key: SPARK-22105
> URL: https://issues.apache.org/jira/browse/SPARK-22105
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SQL
>Affects Versions: 2.3.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Suppose we have a dataframe with many columns (e.g 100 columns), each column 
> is DoubleType.
> And we need to compute avg on each column. We will find using dataframe avg 
> will be much slower than using RDD.aggregate.
> I observe this issue from this PR: (One pass imputer)
> https://github.com/apache/spark/pull/18902
> I also write a minimal testing code to reproduce this issue, I use computing 
> sum to reproduce this issue:
> https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1
> When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be 
> about 3x slower than `RDD.aggregate`, but if we only compute one column, 
> dataframe avg will be much faster than `RDD.aggregate`.
> The reason of this issue, should be the defact in dataframe codegen. Codegen 
> will inline everything and generate large code block. When the column number 
> is large (e.g 100 columns), the codegen size will be too large, which cause 
> jvm failed to JIT and fall back to byte code interpretation.
> This PR should address this issue:
> https://github.com/apache/spark/pull/19082
> But we need more performance code against some code in ML after above PR 
> merged, to check whether this issue is actually fixed.
> This JIRA used to track this performance issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-18 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170319#comment-16170319
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

If there is no sample code, it may take a long time to fix this.
Is it possible to attach all code or to put code to create all of Dataset or 
DataFrame?

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22033) BufferHolder size checks should account for the specific VM array size limitations

2017-09-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169318#comment-16169318
 ] 

Kazuaki Ishizaki commented on SPARK-22033:
--

I think {{ColumnVector}} and {{HashMapGrowthStrategy}} may have possibility of 
the similar issue.
What do you think?

> BufferHolder size checks should account for the specific VM array size 
> limitations
> --
>
> Key: SPARK-22033
> URL: https://issues.apache.org/jira/browse/SPARK-22033
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vadim Semenov
>Priority: Minor
>
> User may get the following OOM Error while running a job with heavy 
> aggregations
> ```
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:235)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:254)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:247)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:88)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:33)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:167)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> ```
> The [`BufferHolder.grow` tries to create a byte array of `Integer.MAX_VALUE` 
> here](https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java#L72)
>  but the maximum size of an array depends on specifics of a VM.
> The safest value seems to be `Integer.MAX_VALUE - 8` 
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l229
> In my JVM:
> ```
> java -version
> openjdk version "1.8.0_141"
> OpenJDK Runtime Environment (build 1.8.0_141-b16)
> OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
> ```
> the max is `new Array[Byte](Integer.MAX_VALUE - 2)`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165902#comment-16165902
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

Thank you for good suggestion. I will try to use {{String.valueOf}}.

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165883#comment-16165883
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

It would be good to generate {{((Long)value13).toString()}} to reduce # of 
boxing/unboxing.
Anyway, as @maropu pointed out, could you please put the query? Then, I will 
create a PR.

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-09-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158995#comment-16158995
 ] 

Kazuaki Ishizaki commented on SPARK-21907:
--

If you cannot provide a repro, could you please run your program with the 
latest master branch?
SPARK-21319 may alleviate this issue.

> NullPointerException in UnsafeExternalSorter.spill()
> 
>
> Key: SPARK-21907
> URL: https://issues.apache.org/jira/browse/SPARK-21907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> I see NPE during sorting with the following stacktrace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at 
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259)
>   at 
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
>   at 
> java.util

[jira] [Commented] (SPARK-21905) ClassCastException when call sqlContext.sql on temp table

2017-09-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158496#comment-16158496
 ] 

Kazuaki Ishizaki commented on SPARK-21905:
--

While I ran the following code (I do not have PointUDT and Point classes), I 
cannot see the exception using master branch or branch-2.2.

{code}
...
import org.apache.spark.sql.catalyst.encoders._
...
import org.apache.spark.sql.types._

  test("SPARK-21905") {
val schema = StructType(List(
  StructField("name", DataTypes.StringType, true),
  StructField("location", new ExamplePointUDT, true)))

val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), 4)
  .map({ x: String => Row.fromSeq(Seq(x, new ExamplePoint(100, 100))) })
val dataFrame = sqlContext.createDataFrame(rowRdd, schema)
dataFrame.createOrReplaceTempView("person")
sqlContext.sql("SELECT * FROM person").foreach(println(_))
  }
{code}

> ClassCastException when call sqlContext.sql on temp table
> -
>
> Key: SPARK-21905
> URL: https://issues.apache.org/jira/browse/SPARK-21905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: bluejoe
>
> {code:java}
> val schema = StructType(List(
>   StructField("name", DataTypes.StringType, true),
>   StructField("location", new PointUDT, true)))
> val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), 
> 4).map({ x: String ⇒ Row.fromSeq(Seq(x, Point(100, 100))) });
> val dataFrame = sqlContext.createDataFrame(rowRdd, schema)
> dataFrame.createOrReplaceTempView("person");
> sqlContext.sql("SELECT * FROM person").foreach(println(_));
> {code}
> the last statement throws exception:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr1$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287)
>   ... 18 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21946) Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`

2017-09-07 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16158044#comment-16158044
 ] 

Kazuaki Ishizaki commented on SPARK-21946:
--

If someone has not worked for this, I will create a PR.

> Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
> 
>
> Key: SPARK-21946
> URL: https://issues.apache.org/jira/browse/SPARK-21946
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> According to the [Apache Spark Jenkins 
> History|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql.execution.command/InMemoryCatalogedDDLSuite/alter_table__rename_cached_table/history/]
> InMemoryCatalogedDDLSuite.`alter table: rename cached table` is very flaky. 
> We had better stablize this.
> {code}
> - alter table: rename cached table !!! CANCELED !!!
>   Array([2,2], [1,1]) did not equal Array([1,1], [2,2]) bad test: wrong data 
> (DDLSuite.scala:786)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-09-06 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156015#comment-16156015
 ] 

Kazuaki Ishizaki commented on SPARK-21907:
--

Thank you for your report. Could you please attach a program that can reproduce 
this issue?

> NullPointerException in UnsafeExternalSorter.spill()
> 
>
> Key: SPARK-21907
> URL: https://issues.apache.org/jira/browse/SPARK-21907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> I see NPE during sorting with the following stacktrace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at 
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259)
>   at 
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:346)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(T

[jira] [Commented] (SPARK-21894) Some Netty errors do not propagate to the top level driver

2017-09-03 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151930#comment-16151930
 ] 

Kazuaki Ishizaki commented on SPARK-21894:
--

Thank you for reporting this issue. Could you please attach a smaller program 
that can reproduce this problem?

> Some Netty errors do not propagate to the top level driver
> --
>
> Key: SPARK-21894
> URL: https://issues.apache.org/jira/browse/SPARK-21894
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Charles Allen
>
> We have an environment with Netty 4.1 ( 
> https://issues.apache.org/jira/browse/SPARK-19552 for some context) and the 
> following error occurs. The reason THIS issue is being filed is because this 
> error leaves the Spark workload in a bad state where it does not make any 
> progress, and does not shut down.
> The expected behavior is that the spark job would throw an exception that can 
> be caught by the driving application.
> {code}
> 017-09-01T16:13:32,175 ERROR [shuffle-server-3-2] 
> org.apache.spark.network.server.TransportRequestHandler - Error sending 
> result StreamResponse{streamId=/jars/lz4-1.3.0.jar, byteCount=236880, 
> body=FileSegmentManagedBuffer{file=/Users/charlesallen/.m2/repository/net/jpountz/lz4/lz4/1.3.0/lz4-1.3.0.jar,
>  offset=0, length=236880}} to /192.168.59.3:56703; closing connection
> java.lang.AbstractMethodError
>   at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:810)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:801)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:296) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:119)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-08-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141650#comment-16141650
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

The issue {{Caused by: org.codehaus.janino.JaninoRuntimeException: Constant 
pool for class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
 has grown past JVM limit of 0x}} will be addressed by [this 
PR|https://github.com/apache/spark/pull/16648].

> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$AbstractPackageMemberCl

[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again

2017-08-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16141361#comment-16141361
 ] 

Kazuaki Ishizaki commented on SPARK-21828:
--

Thank you for your report. Some fixes solved this problem in Spark 2.2, but 
they were not backported to Spark 2.1.
If you need backport to 2.1, please let us know here. I will start identifying 
root cause of this issue and backporting of a PR.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB...again
> -
>
> Key: SPARK-21828
> URL: https://issues.apache.org/jira/browse/SPARK-21828
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Otis Smart
>Priority: Critical
>
> Hello!
> 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., 
> dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of 
> CrossValidator() that includes Pipeline() that includes StringIndexer(), 
> VectorAssembler() and DecisionTreeClassifier()).
> 2. Was the aforementioned patch (aka 
> fix(https://github.com/apache/spark/pull/15480) not included in the latest 
> release; what are the reason and (source) of and solution to this persistent 
> issue please?
> py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 
> in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage 
> 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> /* 001 */ public SpecificOrdering generate(Object[] references)
> { /* 002 */ return new SpecificOrdering(references); /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> /* 006 */
> /* 007 */ private Object[] references;
> /* 008 */
> /* 009 */
> /* 010 */ public SpecificOrdering(Object[] references)
> { /* 011 */ this.references = references; /* 012 */ /* 013 */ }
> /* 014 */
> /* 015 */
> /* 016 */
> /* 017 */ public int compare(InternalRow a, InternalRow b) {
> /* 018 */ InternalRow i = null; // Holds current row being evaluated.
> /* 019 */
> /* 020 */ i = a;
> /* 021 */ boolean isNullA;
> /* 022 */ double primitiveA;
> /* 023 */
> { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = 
> false; /* 027 */ primitiveA = value; /* 028 */ }
> /* 029 */ i = b;
> /* 030 */ boolean isNullB;
> /* 031 */ double primitiveB;
> /* 032 */
> { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = 
> false; /* 036 */ primitiveB = value; /* 037 */ }
> /* 038 */ if (isNullA && isNullB)
> { /* 039 */ // Nothing /* 040 */ }
> else if (isNullA)
> { /* 041 */ return -1; /* 042 */ }
> else if (isNullB)
> { /* 043 */ return 1; /* 044 */ }
> else {
> /* 045 */ int comp = 
> org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB);
> /* 046 */ if (comp != 0)
> { /* 047 */ return comp; /* 048 */ }
> /* 049 */ }
> /* 050 */
> /* 051 */
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again

2017-08-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16140282#comment-16140282
 ] 

Kazuaki Ishizaki commented on SPARK-21828:
--

Thank you for reporting a problem.
First, IIUC, this PR (https://github.com/apache/spark/pull/15480) has been 
included in the latest release. Thus, the test case "SPARK-16845..." in 
{{OrderingSuite.scala}} does not fail.

Could you please put a program that can reproduce this issue? Then, I will 
investigate this.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB...again
> -
>
> Key: SPARK-21828
> URL: https://issues.apache.org/jira/browse/SPARK-21828
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 2.2.0
>Reporter: Otis Smart
>Priority: Critical
>
> Hello!
> 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., 
> dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of 
> CrossValidator() that includes Pipeline() that includes StringIndexer(), 
> VectorAssembler() and DecisionTreeClassifier()).
> 2. Was the aforementioned patch (aka 
> fix(https://github.com/apache/spark/pull/15480) not included in the latest 
> release; what are the reason and (source) of and solution to this persistent 
> issue please?
> py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 
> in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage 
> 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> /* 001 */ public SpecificOrdering generate(Object[] references)
> { /* 002 */ return new SpecificOrdering(references); /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> /* 006 */
> /* 007 */ private Object[] references;
> /* 008 */
> /* 009 */
> /* 010 */ public SpecificOrdering(Object[] references)
> { /* 011 */ this.references = references; /* 012 */ /* 013 */ }
> /* 014 */
> /* 015 */
> /* 016 */
> /* 017 */ public int compare(InternalRow a, InternalRow b) {
> /* 018 */ InternalRow i = null; // Holds current row being evaluated.
> /* 019 */
> /* 020 */ i = a;
> /* 021 */ boolean isNullA;
> /* 022 */ double primitiveA;
> /* 023 */
> { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = 
> false; /* 027 */ primitiveA = value; /* 028 */ }
> /* 029 */ i = b;
> /* 030 */ boolean isNullB;
> /* 031 */ double primitiveB;
> /* 032 */
> { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = 
> false; /* 036 */ primitiveB = value; /* 037 */ }
> /* 038 */ if (isNullA && isNullB)
> { /* 039 */ // Nothing /* 040 */ }
> else if (isNullA)
> { /* 041 */ return -1; /* 042 */ }
> else if (isNullB)
> { /* 043 */ return 1; /* 044 */ }
> else {
> /* 045 */ int comp = 
> org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB);
> /* 046 */ if (comp != 0)
> { /* 047 */ return comp; /* 048 */ }
> /* 049 */ }
> /* 050 */
> /* 051 */
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21750) Use arrow 0.6.0

2017-08-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136920#comment-16136920
 ] 

Kazuaki Ishizaki commented on SPARK-21750:
--

Closed this since to upgrade Arrow requires to upgrade Jenkins environment for 
the Python side. For now, it is not necessary to upgrade Arrow at the Python 
side. Details in the discussion in the PR.

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21750) Use arrow 0.6.0

2017-08-22 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21750.

Resolution: Won't Fix

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21794) exception about reading task serial data(broadcast) value when the storage memory is not enough to unroll

2017-08-20 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134440#comment-16134440
 ] 

Kazuaki Ishizaki commented on SPARK-21794:
--

Thank you for reporting this issue. Could you please attach a program that can 
reproduce this problem?

> exception about reading task serial data(broadcast) value when the storage 
> memory is not enough to unroll
> -
>
> Key: SPARK-21794
> URL: https://issues.apache.org/jira/browse/SPARK-21794
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.1.1
>Reporter: roncenzhao
> Attachments: error stack.png
>
>
> ```
> 17/08/09 19:27:43 ERROR Utils: Exception encountered
> java.util.NoSuchElementException
>   at 
> org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
>   at 
> org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 17/08/09 19:27:43 INFO UnifiedMemoryManager: Will not store broadcast_5 as 
> the required space (1048576 bytes) exceeds our memory limit (878230 bytes)
> 17/08/09 19:27:43 WARN MemoryStore: Failed to reserve initial memory 
> threshold of 1024.0 KB for computing block broadcast_5 in memory.
> 17/08/09 19:27:43 WARN MemoryStore: Not enough space to cache broadcast_5 in 
> memory! (computed 384.0 B so far)
> 17/08/09 19:27:43 INFO MemoryStore: Memory use = 857.6 KB (blocks) + 0.0 B 
> (scratch space shared across 0 tasks(s)) = 857.6 KB. Storage limit = 857.6 KB.
> 17/08/09 19:27:43 ERROR Utils: Exception encountered
> java.util.NoSuchElementException
>   at 
> org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
>   at 
> org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   a

[jira] [Commented] (SPARK-21776) How to use the memory-mapped file on Spark??

2017-08-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131681#comment-16131681
 ] 

Kazuaki Ishizaki commented on SPARK-21776:
--

Is this a question? It this is a kind of questions, it would be good to send a 
message to u...@spark.apache.org OR d...@spark.apache.org.

> How to use the memory-mapped file on Spark??
> 
>
> Key: SPARK-21776
> URL: https://issues.apache.org/jira/browse/SPARK-21776
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Documentation, Input/Output, Spark Core
>Affects Versions: 2.1.1
> Environment: Spark 2.1.1 
> Scala 2.11.8
>Reporter: zhaP524
> Attachments: screenshot-1.png, screenshot-2.png
>
>
>   In generation, we have to use the Spark full quantity loaded HBase 
> table based on one dimension table to generate business, because the base 
> table is total quantity loaded, the memory will pressure is very big, I want 
> to see if the Spark can use this way to deal with memory mapped file?Is there 
> such a mechanism?How do you use it?
>   And I found in the Spark a parameter: 
> spark.storage.memoryMapThreshold=2m, is not very clear what this parameter is 
> used for?
>There is a putBytes and getBytes method in DiskStore.scala with Spark 
> source code, is this the memory-mapped file mentioned above?How to understand?
>Let me know if you have any trouble..
> Wish to You!!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16130231#comment-16130231
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

I identified issues in {{predicates.scala}}. I am creating fixes.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = "" and  Field131 = "" and  Field132 = "" and  Field133 = "" and  
> Field134 = "" and  Field135 = "" and  Field136 = "" and  Field137 = "" and  
> Field138 = "" and  Field139 = "" and  Field140 = "" and  Field141 = "" and  
> Field142 = "" and  Field143 = "" and  Field144 = "" and  Field145 = "" and  
> Field146 = "" and  Field147 = "" and  Field148 = "" and  Field149 = "" and  
> Field150 = "" and  Field151 = "" and  Field152 = "" and  Field153 = "" and  
> Field154 = "" and  Field155 = "" and  Field156 = "" and  Field157 = "" and  
> Field158 = "" and  Field159 = "" and  Field160 = "" and  Field161 = "" and  
> Field162 = "" and  Field163 = "" and  Field164 = "" and  Field165 = "" and  
> Field166 = "" and  Field167 = "" and  Field168 = "" and  Field169 = "" and  
> Field170 = "" and  Field171 = "" and  Field172 = "" and  Field173 = "" and  
> Field174 = "" and  Field175 = "" and  Field176 = "" and  Field177 = "" and  
> Field178 = "" and  Field179 = "" and  Field180 = "" and  Field181 = "" and  
> Field182 = "" and  Field183 = "" and  Field184 = "" and  Field185 = "" and  
> Field186 = "" and  Field187 = "" and  Field188 = "" and  Field189 = "" and  
> Field190 = "" and  Field191 = "" and  Field192 = "" and  Field193 = "" and  
> Field194 = "" and  Field195 = "" and  Field196 = "" and  Field197 = "" and  
> Field198 = "" and  Field199 = "" and  Field200 = "" and  Field201 = "" and  
> Field202 = "" and  Field203 = "" and  Field204

[jira] [Created] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-16 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21751:


 Summary: CodeGeneraor.splitExpressions counts code size more 
precisely
 Key: SPARK-21751
 URL: https://issues.apache.org/jira/browse/SPARK-21751
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki
Priority: Minor


Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
length is more than 1200 characters. It may include comments or empty line.
It would be good to exclude comment or empty line to reduce the number of 
generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21750) Use arrow 0.6.0

2017-08-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129133#comment-16129133
 ] 

Kazuaki Ishizaki commented on SPARK-21750:
--

Waiting for it on mvnrepository

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21750) Use arrow 0.6.0

2017-08-16 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21750:


 Summary: Use arrow 0.6.0
 Key: SPARK-21750
 URL: https://issues.apache.org/jira/browse/SPARK-21750
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki
Priority: Minor


Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127477#comment-16127477
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

When the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = "" and  Field131 = "" and  Field132 = "" and  Field133 = "" and  
> Field134 = "" and  Field135 = "" and  Field136 = "" and  Field137 = "" and  
> Field138 = "" and  Field139 = "" and  Field140 = "" and  Field141 = "" and  
> Field142 = "" and  Field143 = "" and  Field144 = "" and  Field145 = "" and  
> Field146 = "" and  Field147 = "" and  Field148 = "" and  Field149 = "" and  
> Field150 = "" and  Field151 = "" and  Field152 = "" and  Field153 = "" and  
> Field154 = "" and  Field155 = "" and  Field156 = "" and  Field157 = "" and  
> Field158 = "" and  Field159 = "" and  Field160 = "" and  Field161 = "" and  
> Field162 = "" and  Field163 = "" and  Field164 = "" and  Field165 = "" and  
> Field166 = "" and  Field167 = "" and  Field168 = "" and  Field169 = "" and  
> Field170 = "" and  Field171 = "" and  Field172 = "" and  Field173 = "" and  
> Field174 = "" and  Field175 = "" and  Field176 = "" and

[jira] [Comment Edited] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127477#comment-16127477
 ] 

Kazuaki Ishizaki edited comment on SPARK-21720 at 8/15/17 4:26 PM:
---

In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

However, when the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.


was (Author: kiszk):
In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

When the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = "" and  Field131 = "" and  Field132 = "" and  Field133 = "" and  
> Field134 = "" and  Field135 = "" and  Field136 = "" and  Field137 = "" and  
> Field138 = "" and  Field139 = "" and  Field140 = "" and  Field141 = "" and

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16125253#comment-16125253
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

I confirmed that this occurs in the master branch. I will work for this.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = "" and  Field131 = "" and  Field132 = "" and  Field133 = "" and  
> Field134 = "" and  Field135 = "" and  Field136 = "" and  Field137 = "" and  
> Field138 = "" and  Field139 = "" and  Field140 = "" and  Field141 = "" and  
> Field142 = "" and  Field143 = "" and  Field144 = "" and  Field145 = "" and  
> Field146 = "" and  Field147 = "" and  Field148 = "" and  Field149 = "" and  
> Field150 = "" and  Field151 = "" and  Field152 = "" and  Field153 = "" and  
> Field154 = "" and  Field155 = "" and  Field156 = "" and  Field157 = "" and  
> Field158 = "" and  Field159 = "" and  Field160 = "" and  Field161 = "" and  
> Field162 = "" and  Field163 = "" and  Field164 = "" and  Field165 = "" and  
> Field166 = "" and  Field167 = "" and  Field168 = "" and  Field169 = "" and  
> Field170 = "" and  Field171 = "" and  Field172 = "" and  Field173 = "" and  
> Field174 = "" and  Field175 = "" and  Field176 = "" and  Field177 = "" and  
> Field178 = "" and  Field179 = "" and  Field180 = "" and  Field181 = "" and  
> Field182 = "" and  Field183 = "" and  Field184 = "" and  Field185 = "" and  
> Field186 = "" and  Field187 = "" and  Field188 = "" and  Field189 = "" and  
> Field190 = "" and  Field191 = "" and  Field192 = "" and  Field193 = "" and  
> Field194 = "" and  Field195 = "" and  Field196 = "" and  Field197 = "" and  
> Field198 = "" and  Field199 = "" and  Field200 = "" and  Field201 = "" and  
> Field202 = "" and  Field203 = "" and  F

[jira] [Comment Edited] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124982#comment-16124982
 ] 

Kazuaki Ishizaki edited comment on SPARK-19372 at 8/13/17 5:05 PM:
---

[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue? I will work for 
this.



was (Author: kiszk):
[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue?


> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124982#comment-16124982
 ] 

Kazuaki Ishizaki commented on SPARK-19372:
--

[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue?


> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124856#comment-16124856
 ] 

Kazuaki Ishizaki commented on SPARK-19372:
--

Thank you for letting us know the problem. I investigate this.

> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21276) Update lz4-java to remove custom LZ4BlockInputStream

2017-08-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118402#comment-16118402
 ] 

Kazuaki Ishizaki commented on SPARK-21276:
--

Is it better to update affected version?

> Update  lz4-java to remove custom LZ4BlockInputStream
> -
>
> Key: SPARK-21276
> URL: https://issues.apache.org/jira/browse/SPARK-21276
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> We currently use custom LZ4BlockInputStream to read concatenated byte stream 
> in shuffle 
> (https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/io/LZ4BlockInputStream.java#L38).
>  In the recent pr (https://github.com/lz4/lz4-java/pull/105), this 
> functionality is implemented even in lz4-java upstream. So, we might update 
> the lz4-java package that will be released in near future.
> Issue about the next lz4-java release
> https://github.com/lz4/lz4-java/issues/98
> Diff between the latest release and the master in lz4-java
> https://github.com/lz4/lz4-java/compare/62f7547abb0819d1ca1e669645ee1a9d26cd60b0...6480bd9e06f92471bf400c16d4d5f3fd2afa3b3d
>  * fixed NPE in XXHashFactory similarly
>  * Don't place resources in default package to support shading
>  * Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
>  * Try to load lz4-java from java.library.path, then fallback to bundled
>  * Add ppc64le binary
>  * Add s390x JNI binding
>  * Add basic LZ4 Frame v1.5.0 support
>  * enable aarch64 support for lz4-java
>  * Allow unsafeInstance() for ppc64le archiecture
>  * Add unsafeInstance support for AArch64
>  * Support 64-bit JNI build on Solaris
>  * Avoid over-allocating a buffer
>  * Allow EndMark to be incompressible for LZ4FrameInputStream.
>  * Concat byte stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-08-02 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110576#comment-16110576
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Thank you very much for pointing out the good JIRA entry. I will check it.

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21591) Implement treeAggregate on Dataset API

2017-08-01 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108644#comment-16108644
 ] 

Kazuaki Ishizaki commented on SPARK-21591:
--

I like this idea

> Implement treeAggregate on Dataset API
> --
>
> Key: SPARK-21591
> URL: https://issues.apache.org/jira/browse/SPARK-21591
> Project: Spark
>  Issue Type: Brainstorming
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yanbo Liang
>
> The Tungsten execution engine substantially improved the efficiency of memory 
> and CPU for Spark application. However, in MLlib we still not migrate the 
> internal computing workload from {{RDD}} to {{DataFrame}}.
> One of the block issue is there is no {{treeAggregate}} on {{DataFrame}}. 
> It's very important for MLlib algorithms, since they do aggregate on 
> {{Vector}} which may has millions of elements. As we all know, {{RDD}} based 
> {{treeAggregate}} reduces the aggregation time by an order of magnitude for  
> lots of MLlib 
> algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html).
> I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} 
> API and do the performance benchmark related issues. And I think other 
> scenarios except for MLlib will also benefit from this improvement if we get 
> it done.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-07-27 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16104284#comment-16104284
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

[~jamcon] Thank you reporting the problem.
We fixed a problem for the large number (e.g. 4000) of columns. However, we 
know that we have not solved a problem for the very large number (e.g. 12000) 
of columns.
I have just pinged the author that created the fix to solve these two problems.


> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$AbstractPackageMembe

[jira] [Commented] (SPARK-21496) Support codegen for TakeOrderedAndProjectExec

2017-07-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099840#comment-16099840
 ] 

Kazuaki Ishizaki commented on SPARK-21496:
--

Is there any good benchmark program for this?

> Support codegen for TakeOrderedAndProjectExec
> -
>
> Key: SPARK-21496
> URL: https://issues.apache.org/jira/browse/SPARK-21496
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jiang Xingbo
>Priority: Minor
>
> The operator `SortExec` supports codegen, but `TakeOrderedAndProjectExec` 
> doesn't. Perhaps we should also add codegen support for 
> `TakeOrderedAndProjectExec`, but we should also do benchmark for it carefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21517) Fetch local data via block manager cause oom

2017-07-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099654#comment-16099654
 ] 

Kazuaki Ishizaki commented on SPARK-21517:
--

Does it occur in Spark 2.2?

> Fetch local data via block manager cause oom
> 
>
> Key: SPARK-21517
> URL: https://issues.apache.org/jira/browse/SPARK-21517
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.1, 2.1.0
>Reporter: zhoukang
>
> In our production cluster,oom happens when NettyBlockRpcServer receive 
> OpenBlocks message.The reason we observed is below:
> When BlockManagerManagedBuffer call ChunkedByteBuffer#toNetty, it will use 
> Unpooled.wrappedBuffer(ByteBuffer... buffers) which use default 
> maxNumComponents=16 in low-level CompositeByteBuf.When our component's number 
> is bigger than 16, it will execute during buffer copy.
> {code:java}
> private void consolidateIfNeeded() {
> int numComponents = this.components.size();
> if(numComponents > this.maxNumComponents) {
> int capacity = 
> ((CompositeByteBuf.Component)this.components.get(numComponents - 
> 1)).endOffset;
> ByteBuf consolidated = this.allocBuffer(capacity);
> for(int c = 0; c < numComponents; ++c) {
> CompositeByteBuf.Component c1 = 
> (CompositeByteBuf.Component)this.components.get(c);
> ByteBuf b = c1.buf;
> consolidated.writeBytes(b);
> c1.freeIfNecessary();
> }
> CompositeByteBuf.Component var7 = new 
> CompositeByteBuf.Component(consolidated);
> var7.endOffset = var7.length;
> this.components.clear();
> this.components.add(var7);
> }
> }
> {code}
> in CompositeByteBuf which will consume some memory during buffer copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099351#comment-16099351
 ] 

Kazuaki Ishizaki commented on SPARK-21501:
--

I see. I misunderstood the description.
You expect that memory cache would be enabled even when # of entries is larger 
than {{spark.shuffle.service.index.cache.entries}} if the total cache size is 
not large.

> Spark shuffle index cache size should be memory based
> -
>
> Key: SPARK-21501
> URL: https://issues.apache.org/jira/browse/SPARK-21501
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: Thomas Graves
>
> Right now the spark shuffle service has a cache for index files. It is based 
> on a # of files cached (spark.shuffle.service.index.cache.entries). This can 
> cause issues if people have a lot of reducers because the size of each entry 
> can fluctuate based on the # of reducers. 
> We saw an issues with a job that had 17 reducers and it caused NM with 
> spark shuffle service to use 700-800MB or memory in NM by itself.
> We should change this cache to be memory based and only allow a certain 
> memory size used. When I say memory based I mean the cache should have a 
> limit of say 100MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21387.

Resolution: Cannot Reproduce

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21387.

Resolution: Fixed

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki reopened SPARK-21387:
--

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098545#comment-16098545
 ] 

Kazuaki Ishizaki commented on SPARK-21387:
--

While I got OOM in my unit test, I have to reinvestigate whether the unit test 
follows actual restrictions.

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16098531#comment-16098531
 ] 

Kazuaki Ishizaki commented on SPARK-21501:
--

I guess that to use Spark 2.1 or later version alleviates this issue by 
SPARK-15074

> Spark shuffle index cache size should be memory based
> -
>
> Key: SPARK-21501
> URL: https://issues.apache.org/jira/browse/SPARK-21501
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>
> Right now the spark shuffle service has a cache for index files. It is based 
> on a # of files cached (spark.shuffle.service.index.cache.entries). This can 
> cause issues if people have a lot of reducers because the size of each entry 
> can fluctuate based on the # of reducers. 
> We saw an issues with a job that had 17 reducers and it caused NM with 
> spark shuffle service to use 700-800MB or memory in NM by itself.
> We should change this cache to be memory based and only allow a certain 
> memory size used. When I say memory based I mean the cache should have a 
> limit of say 100MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21516) overriding afterEach() in DatasetCacheSuite must call super.afterEach()

2017-07-23 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21516:


 Summary: overriding afterEach() in DatasetCacheSuite must call 
super.afterEach()
 Key: SPARK-21516
 URL: https://issues.apache.org/jira/browse/SPARK-21516
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki


When we override {{afterEach()}} method in Testsuite, we have to call 
{{super.afterEach()}}. This is follow-up of SPARK-21512.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097565#comment-16097565
 ] 

Kazuaki Ishizaki edited comment on SPARK-21512 at 7/24/17 4:53 AM:
---

When {{DatasetCacheSuite}} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}


was (Author: kiszk):
When {DatasetCacheSuite} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097565#comment-16097565
 ] 

Kazuaki Ishizaki commented on SPARK-21512:
--

When {DatasetCacheSuite} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21512:
-
Summary: DatasetCacheSuite needs to execute unpersistent after executing 
peristent  (was: DatasetCacheSuites need to execute unpersistent after 
executing peristent)

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21512) DatasetCacheSuites need to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21512:
-
Summary: DatasetCacheSuites need to execute unpersistent after executing 
peristent  (was: DatasetCacheSuite need to execute unpersistent after executing 
peristent)

> DatasetCacheSuites need to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21512) DatasetCacheSuite need to execute unpersistent after executing peristent

2017-07-22 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21512:


 Summary: DatasetCacheSuite need to execute unpersistent after 
executing peristent
 Key: SPARK-21512
 URL: https://issues.apache.org/jira/browse/SPARK-21512
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20822) Generate code to get value from ColumnVector in ColumnarBatch

2017-07-20 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-20822:
-
Summary: Generate code to get value from ColumnVector in ColumnarBatch  
(was: Generate code to build table cache using ColumnarBatch and to get value 
from ColumnVector)

> Generate code to get value from ColumnVector in ColumnarBatch
> -
>
> Key: SPARK-20822
> URL: https://issues.apache.org/jira/browse/SPARK-20822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20822) Generate code to get value from CachedBatchColumnVector in ColumnarBatch

2017-07-20 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-20822:
-
Summary: Generate code to get value from CachedBatchColumnVector in 
ColumnarBatch  (was: Generate code to get value from ColumnVector in 
ColumnarBatch)

> Generate code to get value from CachedBatchColumnVector in ColumnarBatch
> 
>
> Key: SPARK-20822
> URL: https://issues.apache.org/jira/browse/SPARK-20822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21443) Very long planning duration for queries with lots of operations

2017-07-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16090242#comment-16090242
 ] 

Kazuaki Ishizaki commented on SPARK-21443:
--

These two optimizations {{InferFiltersFromConstraints}} and {{PruneFiltersare}} 
known as time-consuming optimizations.

Since It is not easy to fix to fix the root cause, Spark community introduced 
an option {{spark.sql.constraintPropagation.enabled}} to disable these 
optimization by [this PR|https://github.com/apache/spark/pull/17186].
Is it possible to alleviate the problem by using this option?

> Very long planning duration for queries with lots of operations
> ---
>
> Key: SPARK-21443
> URL: https://issues.apache.org/jira/browse/SPARK-21443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Eyal Zituny
>
> Creating a streaming query with large amount of operations and fields (100+) 
> results in a very long query planning phase. in the example bellow, the plan 
> phase has taken 35 seconds while the actual batch execution took only 1.3 
> second.
> after some investigation, i have found out that the root causes of this are 2 
> optimizer rules which seems to take most of the planning time: 
> InferFiltersFromConstraints and PruneFilters
> I would suggest the following:
> # fix the inefficient optimizer rules
> # add warn level logging if a rule has taken more then xx ms
> # allow custom removing of optimizer rules (opposite to 
> spark.experimental.extraOptimizations)
> # reuse query plans (optional) where possible
> reproducing this issue can be done with the bellow script which simulates the 
> scenario:
> {code:java}
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.execution.streaming.MemoryStream
> import 
> org.apache.spark.sql.streaming.StreamingQueryListener.{QueryProgressEvent, 
> QueryStartedEvent, QueryTerminatedEvent}
> import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQueryListener}
> case class Product(pid: Long, name: String, price: Long, ts: Long = 
> System.currentTimeMillis())
> case class Events (eventId: Long, eventName: String, productId: Long) {
>   def this(id: Long) = this(id, s"event$id", id%100)
> }
> object SparkTestFlow {
>   def main(args: Array[String]): Unit = {
>   val spark = SparkSession
> .builder
> .appName("TestFlow")
> .master("local[8]")
> .getOrCreate()
>   spark.sqlContext.streams.addListener(new StreamingQueryListener 
> {
>   override def onQueryTerminated(event: 
> QueryTerminatedEvent): Unit = {}
>   override def onQueryProgress(event: 
> QueryProgressEvent): Unit = {
>   if (event.progress.numInputRows>0) {
>   println(event.progress.toString())
>   }
>   }
>   override def onQueryStarted(event: QueryStartedEvent): 
> Unit = {}
>   })
>   
>   import spark.implicits._
>   implicit val  sclContext = spark.sqlContext
>   import org.apache.spark.sql.functions.expr
>   val seq = (1L to 100L).map(i => Product(i, s"name$i", 10L*i))
>   val lookupTable = spark.createDataFrame(seq)
>   val inputData = MemoryStream[Events]
>   inputData.addData((1L to 100L).map(i => new Events(i)))
>   val events = inputData.toDF()
> .withColumn("w1", expr("0"))
> .withColumn("x1", expr("0"))
> .withColumn("y1", expr("0"))
> .withColumn("z1", expr("0"))
>   val numberOfSelects = 40 // set to 100+ and the planning takes 
> forever
>   val dfWithSelectsExpr = (2 to 
> numberOfSelects).foldLeft(events)((df,i) =>{
>   val arr = df.columns.++(Array(s"w${i-1} + rand() as 
> w$i", s"x${i-1} + rand() as x$i", s"y${i-1} + 2 as y$i", s"z${i-1} +1 as 
> z$i"))
>   df.selectExpr(arr:_*)
>   })
>   val withJoinAndFilter = dfWithSelectsExpr
> .join(lookupTable, expr("productId = pid"))
> .filter("productId < 50")
>   val query = withJoinAndFilter.writeStream
> .outputMode("append")
> .format("console")
> .trigger(ProcessingTime(2000))
> .start()
>   query.processAllAvailable()
>   spark.stop()
>   }
> }
> {code}
> the query progress output will show: 
> {code:java}
> "durationMs" : {
> "addBatch" : 1310,
> "getBatch" : 6,
> "getO

[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1

2017-07-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089915#comment-16089915
 ] 

Kazuaki Ishizaki commented on SPARK-21415:
--

I see. When another JIRA will happen for these triage scapegoat warnings, we 
could make them umbrella.

> Triage scapegoat warnings, part 1
> -
>
> Key: SPARK-21415
> URL: https://issues.apache.org/jira/browse/SPARK-21415
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Following the results of the scapegoat plugin at 
> https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040
>  and some initial triage, I'd like to address all of the valid instances of 
> some classes of warning:
> - BigDecimal double constructor
> - Catching NPE
> - Finalizer without super
> - List.size is O(n)
> - Prefer Seq.empty
> - Prefer Set.empty
> - reverse.map instead of reverseMap
> - Type shadowing
> - Unnecessary if condition.
> - Use .log1p
> - Var could be val



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089012#comment-16089012
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

cc: [~ueshin] Is there any thought on this?

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088977#comment-16088977
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

When I ran the following test suite in ReplSuite.scala, I got the assertion 
error at the last assertion.

{code:java}
  test("SPARK-21390: incorrect filter with case class") {
val output = runInterpreter("local",
  """
|case class SomeClass(f1: String, f2: String)
|val ds = Seq(SomeClass("a", "b")).toDS
|val filterCond = Seq(SomeClass("a", "b"))
|ds.filter(x => filterCond.contains(SomeClass(x.f1, x.f2))).show
  """.stripMargin)
print(s"$output\n")
assertDoesNotContain("error:", output)
assertDoesNotContain("Exception", output)
assertContains("|  a| b|", output)
  }
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659
 ] 

Kazuaki Ishizaki edited comment on SPARK-21418 at 7/16/17 8:25 AM:
---

I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
{{toString}} method. Do you specify some option to run this program for JVM?


was (Author: kiszk):
I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStre

[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088660#comment-16088660
 ] 

Kazuaki Ishizaki edited comment on SPARK-21393 at 7/15/17 4:53 PM:
---

I confirmed that this python program works well without an exception after 
applying a PR for SPARK-21413.


was (Author: kiszk):
I confirmed that this python program works well after applying a PR for 
SPARK-21413.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.I

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088660#comment-16088660
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

I confirmed that this python program works well after applying a PR for 
SPARK-21413.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.IntWrapper wrapper25;
> /* 062 */   private scala.collection.immutable.Set set28;
> /* 063 */   private UTF8String.IntWrapper wrapper26;
> /* 064 */   private UTF8String.IntWrapper

[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659
 ] 

Kazuaki Ishizaki edited comment on SPARK-21418 at 7/15/17 4:43 PM:
---

I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
`toString` method. Do you specify some option to run this program for JVM?


was (Author: kiszk):
I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.j

[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088659#comment-16088659
 ] 

Kazuaki Ishizaki commented on SPARK-21418:
--

I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087391#comment-16087391
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

Not yet, however I created a patch not to cause failure for a program in 
SPARK-21413.
I will submit a pull request when I can create a test suite for this patch. 
Then, I expect that it will be merged into the master.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.IntWrapper wrapper25;
> /* 062 */   private sca

[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1

2017-07-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087097#comment-16087097
 ] 

Kazuaki Ishizaki commented on SPARK-21415:
--

Thank you. Is it better to create an umbrella JIRA entry for Triage scapegoat 
works?

> Triage scapegoat warnings, part 1
> -
>
> Key: SPARK-21415
> URL: https://issues.apache.org/jira/browse/SPARK-21415
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Following the results of the scapegoat plugin at 
> https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040
>  and some initial triage, I'd like to address all of the valid instances of 
> some classes of warning:
> - BigDecimal double constructor
> - Catching NPE
> - Finalizer without super
> - List.size is O(n)
> - Prefer Seq.empty
> - Prefer Set.empty
> - reverse.map instead of reverseMap
> - Type shadowing
> - Unnecessary if condition.
> - Use .log1p
> - Var could be val



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21413) Multiple projections with CASE WHEN fails to run generated codes

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086927#comment-16086927
 ] 

Kazuaki Ishizaki commented on SPARK-21413:
--

Thank you for preparing a good repro. I can reproduce this problem. I think 
that this can cause the same problem as SPARK-21393.
I am working for this.

> Multiple projections with CASE WHEN fails to run generated codes
> 
>
> Key: SPARK-21413
> URL: https://issues.apache.org/jira/browse/SPARK-21413
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> Scala codes to reproduce are as below:
> {code}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = StructType(StructField("fieldA", IntegerType) :: Nil)
> var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), 
> schema)
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}
> Calling {{explain()}} on the dataframe in the former case shows a huge 
> case-when projection and {{show()}} fails with the exception as below:
> {code}
> ...
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
>   at org.codehaus.janino.CodeContext.write(CodeContext.java:839)
>   at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911)
>   at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776)
> ...
> {code}
> Note that, I could not reproduce this with local relation (this one appears 
> by {{ConvertToLocalRelation}}).
> {code}
> import org.apache.spark.sql.functions._
> var df = Seq(1).toDF("fieldA")
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086254#comment-16086254
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

This program can cause the same exception

{code}
from __future__ import absolute_import, division, print_function

import findspark
findspark.init()
import pyspark
from pyspark.sql.functions import *

from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
import pyspark.sql.functions as sf

sc = SparkContext()
sqlContext = SQLContext(sc)
### data
df = sqlContext.read.load('./Data/claims.csv', 
format='com.databricks.spark.csv', header=True)

df_new = df.withColumn('service_type_col',sf.when((sf.col('RevenueCategory') == 
"Emergency Room") | (sf.col('CPT_Name') == "EMERGENCY DEPT VISIT"), 
'EMERGENCY_CARE').otherwise(0))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('ProcedureCategory').isin([ "Laboratory, General"])) & 
(sf.col('service_type_col') == 0), 
'LAB_AND_PATHOLOGY').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'ROUTINE_RADIOLOGY').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('CPT_Code').isin(["70336"])) & (sf.col('service_type_col') == 
0), 'ADVANCED_IMAGING').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'DURABLE_MEDICAL_EQUIPMENT').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('CPT_Name').isin(['CHIROPRACTIC MANIPULATION'])) & 
(sf.col('service_type_col') == 0), 
'CHIROPRACTIC').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'AMBULANCE').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'RX_MAIL').otherwise(df_new.service_type_col))

df_new.show()
{code}

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.co

[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086189#comment-16086189
 ] 

Kazuaki Ishizaki edited comment on SPARK-21393 at 7/13/17 6:39 PM:
---

Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the number of lines of this program.


was (Author: kiszk):
Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the program.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16086189#comment-16086189
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the program.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.IntWrapper wrapper25;
> /* 062 */   private scala.collection.immutable.Set set28;
> /* 063 */

[jira] [Updated] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21393:
-
Affects Version/s: 2.2.0

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.IntWrapper wrapper25;
> /* 062 */   private scala.collection.immutable.Set set28;
> /* 063 */   private UTF8String.IntWrapper wrapper26;
> /* 064 */   private UTF8String.IntWrapper wrapper27;
> /* 065 */   private scala.collection.immutable.Set set29;
> /* 066 */   private scala.collection.immu

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085939#comment-16085939
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

I created [a PR|https://github.com/apache/spark/pull/18626] to solve this 
problem in Spark 2.1

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056 */ final InternalRow value7 = null;
> /* 057 */ isNull12 = true;
> /* 058 */ value12 = value7;
> /* 059 */   }
> /* 060 */
> /* 061 */
> /* 062 */   private void evalIfCondExpr(InternalRow i) {
> /* 063 */
> /* 064 */ isNull11 = false;
> /* 065 */ value11 = ExternalMapToCatalyst_value_isNull0;
> /* 066 */   }
> /* 067 */
> /* 068 */
> /* 069 */   private void evalIfF

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21390:
-
Affects Version/s: 2.1.0

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21390:
-
Affects Version/s: 2.2.0

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085726#comment-16085726
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

This program seems to require 7 csv files to execute this program. Could you 
please attached these csv files?

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF8String.IntWrapper wrapper25;
> /* 062 */   private scala.collection.immutable.Set set28;
> /* 063 */   private UTF8String.IntWrapper wrapper26;
> /* 064 */   private UTF8String.In

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085342#comment-16085342
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

[~neelrr] Do you want to have fix in future release of 2.1? If so, I will make 
a PR for this backport.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056 */ final InternalRow value7 = null;
> /* 057 */ isNull12 = true;
> /* 058 */ value12 = value7;
> /* 059 */   }
> /* 060 */
> /* 061 */
> /* 062 */   private void evalIfCondExpr(InternalRow i) {
> /* 063 */
> /* 064 */ isNull11 = false;
> /* 065 */ value11 = ExternalMapToCatalyst_value_isNull0;
> /* 066 */   }
> /* 067 */
> /* 068 */
> /* 069 */   private void

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085099#comment-16085099
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/13/17 3:42 AM:
---

[~hyukjin.kwon] I think that SPARK-19254 and/or SPARK-19104 fixed this issue.


was (Author: kiszk):
[~hyukjin.kwon] I think that 
[SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or 
[SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this 
issue.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056 */ final InternalRow value7 = null;
> /* 057 */ isNull12 = true;
> /* 058 */ value12 = value7;
> /* 059 */   }
> /* 060 */
> /* 061 */
> /

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085099#comment-16085099
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

[~hyukjin.kwon] I think that 
[SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or 
[SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this 
issue.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056 */ final InternalRow value7 = null;
> /* 057 */ isNull12 = true;
> /* 058 */ value12 = value7;
> /* 059 */   }
> /* 060 */
> /* 061 */
> /* 062 */   private void evalIfCondExpr(InternalRow i) {
> /* 063 */
> /* 064 */ isNull11 = false;
> /* 065 */ value11 = ExternalMapToCatalyst_

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM:
---

This program works with the master or Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}



was (Author: kiszk):
This program works with the master and Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 0

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM:
---

This program works with the master and Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}



was (Author: kiszk):
This program works with the master.

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(Inter

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084333#comment-16084333
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

This program works with the master.

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056 */ final InternalRow value7 = null;
> /* 057 */ isNull12 = true;
> /* 058 */ value12 = value7;
> /* 059 */   }
> /* 060 */
> /* 061 */
> /* 062 */   private void evalIfCondExpr(InternalRow i) {
> /* 063 */
> /* 064 */ isNull11 = false;
> /* 065 */ value11 = ExternalMapToCata

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084306#comment-16084306
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Another interesting results with Spark-2.2:
On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084306#comment-16084306
 ] 

Kazuaki Ishizaki edited comment on SPARK-21390 at 7/12/17 5:09 PM:
---

Another interesting results with Spark-2.2. Is this only for CaseClass on REPL?

On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> Seq((0, 0), (1, 1), (2, 2)).toDS.filter(x => { val c = Seq((1, 
1)).contains((x._1, x._2)); print(s"$c\n"); c} ).show
false
true
false
+---+---+
| _1| _2|
+---+---+
|  1|  1|
+---+---+
{code}


was (Author: kiszk):
Another interesting results with Spark-2.2:
On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084266#comment-16084266
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Thank you for reporting this. I can reproduce this using Spark 2.2, too.

{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
  /_/
 
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Dataset

scala> case class SomeClass(field1:String, field2:String)
defined class SomeClass

scala> val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
filterCondition: Seq[SomeClass] = List(SomeClass(00,01))

scala> val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
filterMe1: org.apache.spark.sql.Dataset[SomeClass] = [field1: string, field2: 
string]

scala> println("Works fine!" 
+filterMe1.filter(filterCondition.contains(_)).count)
Works fine!1

scala> case class OtherClass(field1:String, field2:String)
defined class OtherClass

scala> val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
filterMe2: org.apache.spark.sql.Dataset[OtherClass] = [field1: string, field2: 
string]

scala> println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
Fail, count should return 1: 0
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-12 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21387:


 Summary: org.apache.spark.memory.TaskMemoryManager.allocatePage 
causes OOM
 Key: SPARK-21387
 URL: https://issues.apache.org/jira/browse/SPARK-21387
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082426#comment-16082426
 ] 

Kazuaki Ishizaki edited comment on SPARK-21373 at 7/11/17 3:56 PM:
---

Since I have not clarified, I changed the title.


was (Author: kiszk):
Since I have not clarified, I changed the title. I will submit a PR for 
improvement.

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21373:
-
Summary: Update Jetty to 9.3.20.v20170531  (was: Update Jetty to 
9.3.20.v20170531 to fix CVE-2017-9735)

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082426#comment-16082426
 ] 

Kazuaki Ishizaki commented on SPARK-21373:
--

Since I have not clarified, I changed the title. I will submit a PR for 
improvement.

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735

2017-07-11 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21373:


 Summary: Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
 Key: SPARK-21373
 URL: https://issues.apache.org/jira/browse/SPARK-21373
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1, 2.3.0
Reporter: Kazuaki Ishizaki


This is derived from https://issues.apache.org/jira/browse/FELIX-5664

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21373:
-
Description: 
This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
[~aroberts] let me know the CVE.

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.

  was:
This is derived from https://issues.apache.org/jira/browse/FELIX-5664

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.


> Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
> -
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Kazuaki Ishizaki
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21364) IndexOutOfBoundsException on equality check of two complex array elements

2017-07-10 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080963#comment-16080963
 ] 

Kazuaki Ishizaki commented on SPARK-21364:
--

When I ran the following test case that is derived from the repro, I succeeded 
to get the result without any exception on the master or 2.1.1.
Do I make some mistakes?

{code}
  test("SPARK-21364") {
val data = Seq(
  "{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[" 
+
"{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"}," +
"{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"}, " +
"{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}" +
"]}}}")
val df = sqlContext.read.json(sparkContext.parallelize(data))
df.select($"menu.popup.menuitem"(lit(0)). === 
($"menu.popup.menuitem"(lit(1.show
  }
{code}

{code}
+-+
|(menu.popup.menuitem[0] = menu.popup.menuitem[1])|
+-+
|false|
+-+
{code}

> IndexOutOfBoundsException on equality check of two complex array elements
> -
>
> Key: SPARK-21364
> URL: https://issues.apache.org/jira/browse/SPARK-21364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Vivek Patangiwar
>Priority: Minor
>
> Getting an IndexOutOfBoundsException with the following code:
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.SparkSession
> object ArrayEqualityTest {
>   def main(s:Array[String]) {
> val sparkSession = 
> SparkSession.builder().master("local[*]").appName("app").getOrCreate()
> val sqlContext = sparkSession.sqlContext
> val sc = sparkSession.sqlContext.sparkContext
> import sparkSession.implicits._
> val df = 
> sqlContext.read.json(sc.parallelize(Seq("{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"},{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"},{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}]}}}")))
> 
> df.select($"menu.popup.menuitem"(lit(0)).===($"menu.popup.menuitem"(lit(1.show
>   }
> }
> Here's the complete stack-trace:
> Exception in thread "main" java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:76)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:75)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.genComp(CodeGenerator.scala:559)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext.genEqual(CodeGenerator.scala:486)
>   at 
> org.apache.spark.sql.catalyst.expressions.EqualTo$$anonfun$doGenCode$4.apply(predicates.scala:437)
>   at 
> org.apache.spark.sql.catalyst.expressions.EqualTo$$anonfun$doGenCode$4.apply(predicates.scala:437)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression$$anonfun$defineCodeGen$2.apply(Expression.scala:442)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression$$anonfun$defineCodeGen$2.apply(Expression.scala:441)
>   at 
> org.apache.spark.sql.catalyst.ex

[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2017-07-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079032#comment-16079032
 ] 

Kazuaki Ishizaki commented on SPARK-21337:
--

I cannot reproduce this by using the latest or v2.1 tag in branch-2.1, too.
Is this issue only for CDH?

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
> Fix For: 2.1.1
>
>
> when there are large 'case when ' expressions in spark sql,the CodeGenerator 
> failed to compile it. 
> Error message is followed by a huge dump of generated source code,at last 
> failed.
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB.
> It seems that SPARK-13242 has solved this problem in spark-1.6.2,however it  
> apparence in spark-2.1.1 again. 
> https://issues.apache.org/jira/browse/SPARK-13242.
> is there something wrong ? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16079010#comment-16079010
 ] 

Kazuaki Ishizaki commented on SPARK-21344:
--

I will work for this if anyone has finished a PR.

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Shubham Chopra
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2017-07-07 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16077952#comment-16077952
 ] 

Kazuaki Ishizaki commented on SPARK-21337:
--

In the master branch, I cannot see a huge dump and did not get a failure.
Should we have to backport a fix into 2.1.1 if I am correct?

{code}
  test("split complex single column expressions") {
val cases = 50
val conditionClauses = 20

// Generate an individual case
def generateCase(n: Int): (Expression, Expression) = {
  val condition = (1 to conditionClauses)
  .map(c => EqualTo(BoundReference(0, StringType, false), 
Literal(s"$c:$n")))
  .reduceLeft[Expression]((l, r) => Or(l, r))
  (condition, Literal(n))
}

val expression = CaseWhen((1 to cases).map(generateCase(_)))

// Currently this throws a java.util.concurrent.ExecutionException wrapping 
a
// org.codehaus.janino.JaninoRuntimeException: Code of method XXX of class 
YYY grows beyond 64 KB
val plan = GenerateMutableProjection.generate(Seq(expression))
  }
{code}

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
> Fix For: 2.1.1
>
>
> when there are large 'case when ' expressions in spark sql,the CodeGenerator 
> failed to compile it. 
> Error message is followed by a huge dump of generated source code,at last 
> failed.
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB.
> It seems like SPARK-13242 has solved this problem in spark-1.6.1,however it  
> apparence in spark-2.1.1 again. 
> https://issues.apache.org/jira/browse/SPARK-13242.
> is there something wrong ? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 2 3 4 5 6 7 8 9 10 >

601 - 700 of 995 matches

Mail list logo