from:"Kazuaki Ishizaki"

Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Kazuaki Ishizaki

Congratulation Tejas!

Kazuaki Ishizaki

From:   Matei Zaharia <matei.zaha...@gmail.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/09/30 04:58
Subject:Welcoming Tejas Patil as a Spark committer

Hi all,

The Spark PMC recently added Tejas Patil as a committer on the
project. Tejas has been contributing across several areas of Spark for
a while, focusing especially on scalability issues and SQL. Please
join me in welcoming Tejas!

Matei

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-09-28 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16184206#comment-16184206
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

Thank you for reporting this again.
While I pinged the original author in [this 
PR|https://github.com/apache/spark/pull/16648], it will not happen yet.

> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitPackageMemberClassDeclaration(UnitCompiler.java:374)
>   at 
> org.codehaus.janino.

Re: [VOTE] Spark 2.1.2 (RC2)

2017-09-27 Thread Kazuaki Ishizaki

+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core/sql-core/sql-catalyst/mllib/mllib-local have passed.

$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 
1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)

% build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 -T 
24 clean package install
% build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core 
-pl 'sql/core' -pl 'sql/catalyst' -pl mllib -pl mllib-local
...
Run completed in 12 minutes, 42 seconds.
Total number of tests run: 1035
Suites: completed 166, aborted 0
Tests: succeeded 1035, failed 0, canceled 0, ignored 5, pending 0
All tests passed.
[INFO] 

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Core . SUCCESS [17:14 
min]
[INFO] Spark Project ML Local Library . SUCCESS [ 
4.067 s]
[INFO] Spark Project Catalyst . SUCCESS [08:23 
min]
[INFO] Spark Project SQL .. SUCCESS [10:50 
min]
[INFO] Spark Project ML Library ... SUCCESS [15:45 
min]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 52:20 min
[INFO] Finished at: 2017-09-28T12:16:46+09:00
[INFO] Final Memory: 103M/309M
[INFO] 

[WARNING] The requested profile "hive" could not be activated because it 
does not exist.

Kazuaki Ishizaki



From:   Dongjoon Hyun <dongjoon.h...@gmail.com>
To: Denny Lee <denny.g@gmail.com>
Cc: Sean Owen <so...@cloudera.com>, Holden Karau 
<hol...@pigscanfly.ca>, "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/09/28 07:57
Subject:Re: [VOTE] Spark 2.1.2 (RC2)



+1 (non-binding)

Bests,
Dongjoon.


On Wed, Sep 27, 2017 at 7:54 AM, Denny Lee <denny.g@gmail.com> wrote:
+1 (non-binding)


On Wed, Sep 27, 2017 at 6:54 AM Sean Owen <so...@cloudera.com> wrote:
+1

I tested the source release.
Hashes and signature (your signature) check out, project builds and tests 
pass with -Phadoop-2.7 -Pyarn -Phive -Pmesos on Debian 9.
List of issues look good and there are no open issues at all for 2.1.2.

Great work on improving the build process and docs.


On Wed, Sep 27, 2017 at 5:47 AM Holden Karau <hol...@pigscanfly.ca> wrote:
Please vote on releasing the following candidate as Apache Spark 
version 2.1.2. The vote is open until Wednesday October 4th at 23:59 
PST and passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.2
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.1.2-rc2 (
fabbb7f59e47590114366d14e15fbbff8c88593c)

List of JIRA tickets resolved in this release can be found with this 
filter.

The release files, including signatures, digests, etc. can be found at:
https://home.apache.org/~holden/spark-2.1.2-rc2-bin/

Release artifacts are signed with a key from:
https://people.apache.org/~holden/holdens_keys.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1251

The documentation corresponding to this release can be found at:
https://people.apache.org/~holden/spark-2.1.2-rc2-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the 
current RC and see if anything important breaks, in the Java/Scala you can 
add the staging repository to your projects resolvers and test with 
the RC (make sure to clean up the artifact cache before/after so you don't 
end up building with a out of date RC going forward).

What should happen to JIRA tickets still targeting 2.1.2?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.1.3.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.1. That being said if 
there is something which is a regression form 2.1.1 that has not been 
correctly targeted please ping a committer to help target the issue (you 
can see the open issues listed as impacting Spark 2.1.1 & 2.1.2)

What are the unresolved issues targeted for 2.1.2?

At this time th

[jira] [Updated] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-22130:
-
Issue Type: Improvement  (was: Bug)

> UTF8String.trim() inefficiently scans all white-space string twice.
> ---
>
> Key: SPARK-22130
> URL: https://issues.apache.org/jira/browse/SPARK-22130
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> {{UTF8String.trim()}} scans a string including only white space (e.g. {{"
> "}}) twice inefficiently. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181190#comment-16181190
 ] 

Kazuaki Ishizaki commented on SPARK-16845:
--

[~mvelusce] Thank you for reporting an issue with repro. I can reproduce this.

If I am correct, Spark 2.2 can fall back into a path disabling code gen by 
[this PR|https://github.com/apache/spark/pull/17087]. Once we tried to backport 
this to Spark 2.1, it was rejected.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> -
>
> Key: SPARK-16845
> URL: https://issues.apache.org/jira/browse/SPARK-16845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: hejie
>Assignee: Liwei Lin
> Fix For: 1.6.4, 2.0.3, 2.1.1, 2.2.0
>
> Attachments: error.txt.zip
>
>
> I have a wide table(400 columns), when I try fitting the traindata on all 
> columns,  the fatal error occurs. 
>   ... 46 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:941)
>   at org.codehaus.janino.CodeContext.write(CodeContext.java:854)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16181070#comment-16181070
 ] 

Kazuaki Ishizaki commented on SPARK-22130:
--

I will submit a PR soon.

> UTF8String.trim() inefficiently scans all white-space string twice.
> ---
>
> Key: SPARK-22130
> URL: https://issues.apache.org/jira/browse/SPARK-22130
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> {{UTF8String.trim()}} scans a string including only white space (e.g. {{"
> "}}) twice inefficiently. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-22130) UTF8String.trim() inefficiently scans all white-space string twice.

2017-09-26 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-22130:


 Summary: UTF8String.trim() inefficiently scans all white-space 
string twice.
 Key: SPARK-22130
 URL: https://issues.apache.org/jira/browse/SPARK-22130
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki
Priority: Minor


{{UTF8String.trim()}} scans a string including only white space (e.g. {{"
"}}) twice inefficiently. 




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen

2017-09-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176655#comment-16176655
 ] 

Kazuaki Ishizaki edited comment on SPARK-22105 at 9/22/17 4:22 PM:
---

Can these PRs at https://issues.apache.org/jira/browse/SPARK-21870 and 
https://issues.apache.org/jira/browse/SPARK-21871 alleviate this issue?


was (Author: kiszk):
Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this 
issue?

> Dataframe has poor performance when computing on many columns with codegen
> --
>
> Key: SPARK-22105
> URL: https://issues.apache.org/jira/browse/SPARK-22105
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SQL
>Affects Versions: 2.3.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Suppose we have a dataframe with many columns (e.g 100 columns), each column 
> is DoubleType.
> And we need to compute avg on each column. We will find using dataframe avg 
> will be much slower than using RDD.aggregate.
> I observe this issue from this PR: (One pass imputer)
> https://github.com/apache/spark/pull/18902
> I also write a minimal testing code to reproduce this issue, I use computing 
> sum to reproduce this issue:
> https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1
> When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be 
> about 3x slower than `RDD.aggregate`, but if we only compute one column, 
> dataframe avg will be much faster than `RDD.aggregate`.
> The reason of this issue, should be the defact in dataframe codegen. Codegen 
> will inline everything and generate large code block. When the column number 
> is large (e.g 100 columns), the codegen size will be too large, which cause 
> jvm failed to JIT and fall back to byte code interpretation.
> This PR should address this issue:
> https://github.com/apache/spark/pull/19082
> But we need more performance test against some code in ML after above PR 
> merged, to check whether this issue is actually fixed.
> This JIRA used to track this performance issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22105) Dataframe has poor performance when computing on many columns with codegen

2017-09-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176655#comment-16176655
 ] 

Kazuaki Ishizaki commented on SPARK-22105:
--

Can this PR at https://issues.apache.org/jira/browse/SPARK-21871 alleviate this 
issue?

> Dataframe has poor performance when computing on many columns with codegen
> --
>
> Key: SPARK-22105
> URL: https://issues.apache.org/jira/browse/SPARK-22105
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SQL
>Affects Versions: 2.3.0
>Reporter: Weichen Xu
>Priority: Minor
>
> Suppose we have a dataframe with many columns (e.g 100 columns), each column 
> is DoubleType.
> And we need to compute avg on each column. We will find using dataframe avg 
> will be much slower than using RDD.aggregate.
> I observe this issue from this PR: (One pass imputer)
> https://github.com/apache/spark/pull/18902
> I also write a minimal testing code to reproduce this issue, I use computing 
> sum to reproduce this issue:
> https://github.com/apache/spark/compare/master...WeichenXu123:aggr_test2?expand=1
> When we compute `sum` on 100 `DoubleType` columns, dataframe avg will be 
> about 3x slower than `RDD.aggregate`, but if we only compute one column, 
> dataframe avg will be much faster than `RDD.aggregate`.
> The reason of this issue, should be the defact in dataframe codegen. Codegen 
> will inline everything and generate large code block. When the column number 
> is large (e.g 100 columns), the codegen size will be too large, which cause 
> jvm failed to JIT and fall back to byte code interpretation.
> This PR should address this issue:
> https://github.com/apache/spark/pull/19082
> But we need more performance code against some code in ML after above PR 
> merged, to check whether this issue is actually fixed.
> This JIRA used to track this performance issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-18 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170319#comment-16170319
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

If there is no sample code, it may take a long time to fix this.
Is it possible to attach all code or to put code to create all of Dataset or 
DataFrame?

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22033) BufferHolder size checks should account for the specific VM array size limitations

2017-09-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169318#comment-16169318
 ] 

Kazuaki Ishizaki commented on SPARK-22033:
--

I think {{ColumnVector}} and {{HashMapGrowthStrategy}} may have possibility of 
the similar issue.
What do you think?

> BufferHolder size checks should account for the specific VM array size 
> limitations
> --
>
> Key: SPARK-22033
> URL: https://issues.apache.org/jira/browse/SPARK-22033
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Vadim Semenov
>Priority: Minor
>
> User may get the following OOM Error while running a job with heavy 
> aggregations
> ```
> java.lang.OutOfMemoryError: Requested array size exceeds VM limit
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder.grow(BufferHolder.java:73)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:235)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter.write(UnsafeRowWriter.java:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:254)
>   at 
> org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$generateResultProjection$2.apply(AggregationIterator.scala:247)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:88)
>   at 
> org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.next(ObjectAggregationIterator.scala:33)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:167)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>   at org.apache.spark.scheduler.Task.run(Task.scala:108)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> ```
> The [`BufferHolder.grow` tries to create a byte array of `Integer.MAX_VALUE` 
> here](https://github.com/apache/spark/blob/v2.2.0/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java#L72)
>  but the maximum size of an array depends on specifics of a VM.
> The safest value seems to be `Integer.MAX_VALUE - 8` 
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/classes/java/util/ArrayList.java#l229
> In my JVM:
> ```
> java -version
> openjdk version "1.8.0_141"
> OpenJDK Runtime Environment (build 1.8.0_141-b16)
> OpenJDK 64-Bit Server VM (build 25.141-b16, mixed mode)
> ```
> the max is `new Array[Byte](Integer.MAX_VALUE - 2)`



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165902#comment-16165902
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

Thank you for good suggestion. I will try to use {{String.valueOf}}.

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22000) org.codehaus.commons.compiler.CompileException: toString method is not declared

2017-09-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165883#comment-16165883
 ] 

Kazuaki Ishizaki commented on SPARK-22000:
--

It would be good to generate {{((Long)value13).toString()}} to reduce # of 
boxing/unboxing.
Anyway, as @maropu pointed out, could you please put the query? Then, I will 
create a PR.

> org.codehaus.commons.compiler.CompileException: toString method is not 
> declared
> ---
>
> Key: SPARK-22000
> URL: https://issues.apache.org/jira/browse/SPARK-22000
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: taiho choi
>
> the error message say that toString is not declared on "value13" which is 
> "long" type in generated code.
> i think value13 should be Long type.
> ==error message
> Caused by: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 70, Column 32: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 70, Column 32: A method named "toString" is not declared in any enclosing 
> class nor any supertype, nor through a static import
> /* 033 */   private void apply1_2(InternalRow i) {
> /* 034 */
> /* 035 */
> /* 036 */ boolean isNull11 = i.isNullAt(1);
> /* 037 */ UTF8String value11 = isNull11 ? null : (i.getUTF8String(1));
> /* 038 */ boolean isNull10 = true;
> /* 039 */ java.lang.String value10 = null;
> /* 040 */ if (!isNull11) {
> /* 041 */
> /* 042 */   isNull10 = false;
> /* 043 */   if (!isNull10) {
> /* 044 */
> /* 045 */ Object funcResult4 = null;
> /* 046 */ funcResult4 = value11.toString();
> /* 047 */
> /* 048 */ if (funcResult4 != null) {
> /* 049 */   value10 = (java.lang.String) funcResult4;
> /* 050 */ } else {
> /* 051 */   isNull10 = true;
> /* 052 */ }
> /* 053 */
> /* 054 */
> /* 055 */   }
> /* 056 */ }
> /* 057 */ javaBean.setApp(value10);
> /* 058 */
> /* 059 */
> /* 060 */ boolean isNull13 = i.isNullAt(12);
> /* 061 */ long value13 = isNull13 ? -1L : (i.getLong(12));
> /* 062 */ boolean isNull12 = true;
> /* 063 */ java.lang.String value12 = null;
> /* 064 */ if (!isNull13) {
> /* 065 */
> /* 066 */   isNull12 = false;
> /* 067 */   if (!isNull12) {
> /* 068 */
> /* 069 */ Object funcResult5 = null;
> /* 070 */ funcResult5 = value13.toString();
> /* 071 */
> /* 072 */ if (funcResult5 != null) {
> /* 073 */   value12 = (java.lang.String) funcResult5;
> /* 074 */ } else {
> /* 075 */   isNull12 = true;
> /* 076 */ }
> /* 077 */
> /* 078 */
> /* 079 */   }
> /* 080 */ }
> /* 081 */ javaBean.setReasonCode(value12);
> /* 082 */
> /* 083 */   }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-09-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158995#comment-16158995
 ] 

Kazuaki Ishizaki commented on SPARK-21907:
--

If you cannot provide a repro, could you please run your program with the 
latest master branch?
SPARK-21319 may alleviate this issue.

> NullPointerException in UnsafeExternalSorter.spill()
> 
>
> Key: SPARK-21907
> URL: https://issues.apache.org/jira/browse/SPARK-21907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> I see NPE during sorting with the following stacktrace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at 
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259)
>   at 
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(S

[jira] [Commented] (SPARK-21905) ClassCastException when call sqlContext.sql on temp table

2017-09-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158496#comment-16158496
 ] 

Kazuaki Ishizaki commented on SPARK-21905:
--

While I ran the following code (I do not have PointUDT and Point classes), I 
cannot see the exception using master branch or branch-2.2.

{code}
...
import org.apache.spark.sql.catalyst.encoders._
...
import org.apache.spark.sql.types._

  test("SPARK-21905") {
val schema = StructType(List(
  StructField("name", DataTypes.StringType, true),
  StructField("location", new ExamplePointUDT, true)))

val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), 4)
  .map({ x: String => Row.fromSeq(Seq(x, new ExamplePoint(100, 100))) })
val dataFrame = sqlContext.createDataFrame(rowRdd, schema)
dataFrame.createOrReplaceTempView("person")
sqlContext.sql("SELECT * FROM person").foreach(println(_))
  }
{code}

> ClassCastException when call sqlContext.sql on temp table
> -
>
> Key: SPARK-21905
> URL: https://issues.apache.org/jira/browse/SPARK-21905
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: bluejoe
>
> {code:java}
> val schema = StructType(List(
>   StructField("name", DataTypes.StringType, true),
>   StructField("location", new PointUDT, true)))
> val rowRdd = sqlContext.sparkContext.parallelize(Seq("bluejoe", "alex"), 
> 4).map({ x: String ⇒ Row.fromSeq(Seq(x, Point(100, 100))) });
> val dataFrame = sqlContext.createDataFrame(rowRdd, schema)
> dataFrame.createOrReplaceTempView("person");
> sqlContext.sql("SELECT * FROM person").foreach(println(_));
> {code}
> the last statement throws exception:
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRow cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.evalIfFalseExpr1$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287)
>   ... 18 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21946) Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`

2017-09-07 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158044#comment-16158044
 ] 

Kazuaki Ishizaki commented on SPARK-21946:
--

If someone has not worked for this, I will create a PR.

> Flaky test: InMemoryCatalogedDDLSuite.`alter table: rename cached table`
> 
>
> Key: SPARK-21946
> URL: https://issues.apache.org/jira/browse/SPARK-21946
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> According to the [Apache Spark Jenkins 
> History|https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastCompletedBuild/testReport/org.apache.spark.sql.execution.command/InMemoryCatalogedDDLSuite/alter_table__rename_cached_table/history/]
> InMemoryCatalogedDDLSuite.`alter table: rename cached table` is very flaky. 
> We had better stablize this.
> {code}
> - alter table: rename cached table !!! CANCELED !!!
>   Array([2,2], [1,1]) did not equal Array([1,1], [2,2]) bad test: wrong data 
> (DDLSuite.scala:786)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-09-06 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16156015#comment-16156015
 ] 

Kazuaki Ishizaki commented on SPARK-21907:
--

Thank you for your report. Could you please attach a program that can reproduce 
this issue?

> NullPointerException in UnsafeExternalSorter.spill()
> 
>
> Key: SPARK-21907
> URL: https://issues.apache.org/jira/browse/SPARK-21907
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Juliusz Sompolski
>
> I see NPE during sorting with the following stacktrace:
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:383)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:63)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter$SortComparator.compare(UnsafeInMemorySorter.java:43)
>   at 
> org.apache.spark.util.collection.TimSort.countRunAndMakeAscending(TimSort.java:270)
>   at org.apache.spark.util.collection.TimSort.sort(TimSort.java:142)
>   at org.apache.spark.util.collection.Sorter.sort(Sorter.scala:37)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.getSortedIterator(UnsafeInMemorySorter.java:345)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:206)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.reset(UnsafeInMemorySorter.java:173)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.spill(UnsafeExternalSorter.java:221)
>   at 
> org.apache.spark.memory.TaskMemoryManager.acquireExecutionMemory(TaskMemoryManager.java:203)
>   at 
> org.apache.spark.memory.TaskMemoryManager.allocatePage(TaskMemoryManager.java:281)
>   at 
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:90)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.growPointerArrayIfNecessary(UnsafeExternalSorter.java:349)
>   at 
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:400)
>   at 
> org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:109)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at 
> org.apache.spark.sql.execution.RowIteratorFromScala.advanceNext(RowIterator.scala:83)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.advancedStreamed(SortMergeJoinExec.scala:778)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinScanner.findNextInnerJoinRows(SortMergeJoinExec.scala:685)
>   at 
> org.apache.spark.sql.execution.joins.SortMergeJoinExec$$anonfun$doExecute$1$$anon$2.advanceNext(SortMergeJoinExec.scala:259)
>   at 
> org.apache.spark.sql.execution.RowIteratorToScala.hasNext(RowIterator.scala:68)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithKeys$(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>   at 
> org.apache.spark.sche

[jira] [Commented] (SPARK-21894) Some Netty errors do not propagate to the top level driver

2017-09-03 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151930#comment-16151930
 ] 

Kazuaki Ishizaki commented on SPARK-21894:
--

Thank you for reporting this issue. Could you please attach a smaller program 
that can reproduce this problem?

> Some Netty errors do not propagate to the top level driver
> --
>
> Key: SPARK-21894
> URL: https://issues.apache.org/jira/browse/SPARK-21894
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Charles Allen
>
> We have an environment with Netty 4.1 ( 
> https://issues.apache.org/jira/browse/SPARK-19552 for some context) and the 
> following error occurs. The reason THIS issue is being filed is because this 
> error leaves the Spark workload in a bad state where it does not make any 
> progress, and does not shut down.
> The expected behavior is that the spark job would throw an exception that can 
> be caught by the driving application.
> {code}
> 017-09-01T16:13:32,175 ERROR [shuffle-server-3-2] 
> org.apache.spark.network.server.TransportRequestHandler - Error sending 
> result StreamResponse{streamId=/jars/lz4-1.3.0.jar, byteCount=236880, 
> body=FileSegmentManagedBuffer{file=/Users/charlesallen/.m2/repository/net/jpountz/lz4/lz4/1.3.0/lz4-1.3.0.jar,
>  offset=0, length=236880}} to /192.168.59.3:56703; closing connection
> java.lang.AbstractMethodError
>   at io.netty.util.ReferenceCountUtil.touch(ReferenceCountUtil.java:73) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:107)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:810)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:111)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:305) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:801)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:814)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:794)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:831)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032)
>  ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:296) 
> ~[netty-all-4.1.11.Final.jar:4.1.11.Final]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.respond(TransportRequestHandler.java:194)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:150)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:111)
>  [spark-network-common_2.11-2.1.0-mmx9.jar:2.1.0-mmx9]
>   at 
> org.apache.spark.network.server.Tra

Re: Welcoming Saisai (Jerry) Shao as a committer

2017-08-28 Thread Kazuaki Ishizaki

Congratulations, Jerry!

Kazuaki Ishizaki

From:   Hyukjin Kwon <gurwls...@gmail.com>
To: dev <dev@spark.apache.org>
Date:   2017/08/29 12:24
Subject:Re: Welcoming Saisai (Jerry) Shao as a committer

Congratulations! Very well deserved.

2017-08-29 11:41 GMT+09:00 Liwei Lin <lwl...@gmail.com>:
Congratulations, Jerry!

Cheers,
Liwei

On Tue, Aug 29, 2017 at 10:15 AM, 蒋星博 <jiangxb1...@gmail.com> wrote:
congs！

Takeshi Yamamuro <linguin@gmail.com>于2017年8月28日 周一下午7:11写道：
Congrats!

On Tue, Aug 29, 2017 at 11:04 AM, zhichao <lisurpr...@gmail.com> wrote:
Congratulations, Jerry!

On Tue, Aug 29, 2017 at 9:57 AM, Weiqing Yang <yangweiqing...@gmail.com> 
wrote:
Congratulations, Jerry!

On Mon, Aug 28, 2017 at 6:44 PM, Yanbo Liang <yblia...@gmail.com> wrote:
Congratulations, Jerry.

On Tue, Aug 29, 2017 at 9:42 AM, John Deng <mailt...@163.com> wrote:

Congratulations, Jerry !

On 8/29/2017 09:28，Matei Zaharia<matei.zaha...@gmail.com> wrote： 
Hi everyone, 

The PMC recently voted to add Saisai (Jerry) Shao as a committer. Saisai has 
been contributing to many areas of the project for a long time, so it
’s great to see him join. Join me in thanking and congratulating him! 

Matei 
- 
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 

-- 
---
Takeshi Yamamuro

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-08-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141650#comment-16141650
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

The issue {{Caused by: org.codehaus.janino.JaninoRuntimeException: Constant 
pool for class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
 has grown past JVM limit of 0x}} will be addressed by [this 
PR|https://github.com/apache/spark/pull/16648].

> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
>

[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again

2017-08-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16141361#comment-16141361
 ] 

Kazuaki Ishizaki commented on SPARK-21828:
--

Thank you for your report. Some fixes solved this problem in Spark 2.2, but 
they were not backported to Spark 2.1.
If you need backport to 2.1, please let us know here. I will start identifying 
root cause of this issue and backporting of a PR.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB...again
> -
>
> Key: SPARK-21828
> URL: https://issues.apache.org/jira/browse/SPARK-21828
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Otis Smart
>Priority: Critical
>
> Hello!
> 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., 
> dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of 
> CrossValidator() that includes Pipeline() that includes StringIndexer(), 
> VectorAssembler() and DecisionTreeClassifier()).
> 2. Was the aforementioned patch (aka 
> fix(https://github.com/apache/spark/pull/15480) not included in the latest 
> release; what are the reason and (source) of and solution to this persistent 
> issue please?
> py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 
> in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage 
> 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> /* 001 */ public SpecificOrdering generate(Object[] references)
> { /* 002 */ return new SpecificOrdering(references); /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> /* 006 */
> /* 007 */ private Object[] references;
> /* 008 */
> /* 009 */
> /* 010 */ public SpecificOrdering(Object[] references)
> { /* 011 */ this.references = references; /* 012 */ /* 013 */ }
> /* 014 */
> /* 015 */
> /* 016 */
> /* 017 */ public int compare(InternalRow a, InternalRow b) {
> /* 018 */ InternalRow i = null; // Holds current row being evaluated.
> /* 019 */
> /* 020 */ i = a;
> /* 021 */ boolean isNullA;
> /* 022 */ double primitiveA;
> /* 023 */
> { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = 
> false; /* 027 */ primitiveA = value; /* 028 */ }
> /* 029 */ i = b;
> /* 030 */ boolean isNullB;
> /* 031 */ double primitiveB;
> /* 032 */
> { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = 
> false; /* 036 */ primitiveB = value; /* 037 */ }
> /* 038 */ if (isNullA && isNullB)
> { /* 039 */ // Nothing /* 040 */ }
> else if (isNullA)
> { /* 041 */ return -1; /* 042 */ }
> else if (isNullB)
> { /* 043 */ return 1; /* 044 */ }
> else {
> /* 045 */ int comp = 
> org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB);
> /* 046 */ if (comp != 0)
> { /* 047 */ return comp; /* 048 */ }
> /* 049 */ }
> /* 050 */
> /* 051 */
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again

2017-08-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16140282#comment-16140282
 ] 

Kazuaki Ishizaki commented on SPARK-21828:
--

Thank you for reporting a problem.
First, IIUC, this PR (https://github.com/apache/spark/pull/15480) has been 
included in the latest release. Thus, the test case "SPARK-16845..." in 
{{OrderingSuite.scala}} does not fail.

Could you please put a program that can reproduce this issue? Then, I will 
investigate this.

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB...again
> -
>
> Key: SPARK-21828
> URL: https://issues.apache.org/jira/browse/SPARK-21828
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 2.2.0
>Reporter: Otis Smart
>Priority: Critical
>
> Hello!
> 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., 
> dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of 
> CrossValidator() that includes Pipeline() that includes StringIndexer(), 
> VectorAssembler() and DecisionTreeClassifier()).
> 2. Was the aforementioned patch (aka 
> fix(https://github.com/apache/spark/pull/15480) not included in the latest 
> release; what are the reason and (source) of and solution to this persistent 
> issue please?
> py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 
> in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage 
> 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> /* 001 */ public SpecificOrdering generate(Object[] references)
> { /* 002 */ return new SpecificOrdering(references); /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> /* 006 */
> /* 007 */ private Object[] references;
> /* 008 */
> /* 009 */
> /* 010 */ public SpecificOrdering(Object[] references)
> { /* 011 */ this.references = references; /* 012 */ /* 013 */ }
> /* 014 */
> /* 015 */
> /* 016 */
> /* 017 */ public int compare(InternalRow a, InternalRow b) {
> /* 018 */ InternalRow i = null; // Holds current row being evaluated.
> /* 019 */
> /* 020 */ i = a;
> /* 021 */ boolean isNullA;
> /* 022 */ double primitiveA;
> /* 023 */
> { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = 
> false; /* 027 */ primitiveA = value; /* 028 */ }
> /* 029 */ i = b;
> /* 030 */ boolean isNullB;
> /* 031 */ double primitiveB;
> /* 032 */
> { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = 
> false; /* 036 */ primitiveB = value; /* 037 */ }
> /* 038 */ if (isNullA && isNullB)
> { /* 039 */ // Nothing /* 040 */ }
> else if (isNullA)
> { /* 041 */ return -1; /* 042 */ }
> else if (isNullB)
> { /* 043 */ return 1; /* 044 */ }
> else {
> /* 045 */ int comp = 
> org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB);
> /* 046 */ if (comp != 0)
> { /* 047 */ return comp; /* 048 */ }
> /* 049 */ }
> /* 050 */
> /* 051 */
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21750) Use arrow 0.6.0

2017-08-22 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16136920#comment-16136920
 ] 

Kazuaki Ishizaki commented on SPARK-21750:
--

Closed this since to upgrade Arrow requires to upgrade Jenkins environment for 
the Python side. For now, it is not necessary to upgrade Arrow at the Python 
side. Details in the discussion in the PR.

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21750) Use arrow 0.6.0

2017-08-22 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21750.

Resolution: Won't Fix

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21794) exception about reading task serial data(broadcast) value when the storage memory is not enough to unroll

2017-08-20 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134440#comment-16134440
 ] 

Kazuaki Ishizaki commented on SPARK-21794:
--

Thank you for reporting this issue. Could you please attach a program that can 
reproduce this problem?

> exception about reading task serial data(broadcast) value when the storage 
> memory is not enough to unroll
> -
>
> Key: SPARK-21794
> URL: https://issues.apache.org/jira/browse/SPARK-21794
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.1.1
>Reporter: roncenzhao
> Attachments: error stack.png
>
>
> ```
> 17/08/09 19:27:43 ERROR Utils: Exception encountered
> java.util.NoSuchElementException
>   at 
> org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
>   at 
> org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 17/08/09 19:27:43 INFO UnifiedMemoryManager: Will not store broadcast_5 as 
> the required space (1048576 bytes) exceeds our memory limit (878230 bytes)
> 17/08/09 19:27:43 WARN MemoryStore: Failed to reserve initial memory 
> threshold of 1024.0 KB for computing block broadcast_5 in memory.
> 17/08/09 19:27:43 WARN MemoryStore: Not enough space to cache broadcast_5 in 
> memory! (computed 384.0 B so far)
> 17/08/09 19:27:43 INFO MemoryStore: Memory use = 857.6 KB (blocks) + 0.0 B 
> (scratch space shared across 0 tasks(s)) = 857.6 KB. Storage limit = 857.6 KB.
> 17/08/09 19:27:43 ERROR Utils: Exception encountered
> java.util.NoSuchElementException
>   at 
> org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
>   at 
> org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
>   at 
> org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
>   at scala.Option.map(Option.scala:146)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
>   at org.apache.spark.broadcast.Broadcast.value(Broadc

[jira] [Commented] (SPARK-21776) How to use the memory-mapped file on Spark??

2017-08-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16131681#comment-16131681
 ] 

Kazuaki Ishizaki commented on SPARK-21776:
--

Is this a question? It this is a kind of questions, it would be good to send a 
message to u...@spark.apache.org OR d...@spark.apache.org.

> How to use the memory-mapped file on Spark??
> 
>
> Key: SPARK-21776
> URL: https://issues.apache.org/jira/browse/SPARK-21776
> Project: Spark
>  Issue Type: Bug
>  Components: Block Manager, Documentation, Input/Output, Spark Core
>Affects Versions: 2.1.1
> Environment: Spark 2.1.1 
> Scala 2.11.8
>Reporter: zhaP524
> Attachments: screenshot-1.png, screenshot-2.png
>
>
>   In generation, we have to use the Spark full quantity loaded HBase 
> table based on one dimension table to generate business, because the base 
> table is total quantity loaded, the memory will pressure is very big, I want 
> to see if the Spark can use this way to deal with memory mapped file?Is there 
> such a mechanism?How do you use it?
>   And I found in the Spark a parameter: 
> spark.storage.memoryMapThreshold=2m, is not very clear what this parameter is 
> used for?
>There is a putBytes and getBytes method in DiskStore.scala with Spark 
> source code, is this the memory-mapped file mentioned above?How to understand?
>Let me know if you have any trouble..
> Wish to You!!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16130231#comment-16130231
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

I identified issues in {{predicates.scala}}. I am creating fixes.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = &

[jira] [Created] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-08-16 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21751:


 Summary: CodeGeneraor.splitExpressions counts code size more 
precisely
 Key: SPARK-21751
 URL: https://issues.apache.org/jira/browse/SPARK-21751
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Kazuaki Ishizaki
Priority: Minor


Current {{CodeGeneraor.splitExpressions}} splits statements if their total 
length is more than 1200 characters. It may include comments or empty line.
It would be good to exclude comment or empty line to reduce the number of 
generated methods in a class. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21750) Use arrow 0.6.0

2017-08-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129133#comment-16129133
 ] 

Kazuaki Ishizaki commented on SPARK-21750:
--

Waiting for it on mvnrepository

> Use arrow 0.6.0
> ---
>
> Key: SPARK-21750
> URL: https://issues.apache.org/jira/browse/SPARK-21750
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
> released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21750) Use arrow 0.6.0

2017-08-16 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21750:


 Summary: Use arrow 0.6.0
 Key: SPARK-21750
 URL: https://issues.apache.org/jira/browse/SPARK-21750
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki
Priority: Minor


Since [Arrow 0.6.0|http://arrow.apache.org/release/0.6.0.html] has been 
released, use the latest one



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127477#comment-16127477
 ] 

Kazuaki Ishizaki edited comment on SPARK-21720 at 8/15/17 4:26 PM:
---

In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

However, when the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.


was (Author: kiszk):
In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

When the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = ""

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16127477#comment-16127477
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

In this case, to add JVM option {{-Xss512m}} eliminates this exception and this 
works well.

When the number of fields is 1024, I got the following exception:
{code}
08:41:40.022 ERROR 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed to 
compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
"apply(Lorg/apache/spark/sql/catalyst/InternalRow;)Lorg/apache/spark/sql/catalyst/expressions/UnsafeRow;"
 of class 
"org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
 grows beyond 64 KB
...
{code}

I am working for solving this 64KB problem.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 =

[jira] [Commented] (SPARK-21720) Filter predicate with many conditions throw stackoverflow error

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16125253#comment-16125253
 ] 

Kazuaki Ishizaki commented on SPARK-21720:
--

I confirmed that this occurs in the master branch. I will work for this.

> Filter predicate with many conditions throw stackoverflow error
> ---
>
> Key: SPARK-21720
> URL: https://issues.apache.org/jira/browse/SPARK-21720
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: srinivasan
>
> When trying to filter on dataset with many predicate conditions on both spark 
> sql and dataset filter transformation as described below, spark throws a 
> stackoverflow exception
> Case 1: Filter Transformation on Data
> Dataset filter = sourceDataset.filter(String.format("not(%s)", 
> buildQuery()));
> filter.show();
> where buildQuery() returns
> Field1 = "" and  Field2 = "" and  Field3 = "" and  Field4 = "" and  Field5 = 
> "" and  BLANK_5 = "" and  Field7 = "" and  Field8 = "" and  Field9 = "" and  
> Field10 = "" and  Field11 = "" and  Field12 = "" and  Field13 = "" and  
> Field14 = "" and  Field15 = "" and  Field16 = "" and  Field17 = "" and  
> Field18 = "" and  Field19 = "" and  Field20 = "" and  Field21 = "" and  
> Field22 = "" and  Field23 = "" and  Field24 = "" and  Field25 = "" and  
> Field26 = "" and  Field27 = "" and  Field28 = "" and  Field29 = "" and  
> Field30 = "" and  Field31 = "" and  Field32 = "" and  Field33 = "" and  
> Field34 = "" and  Field35 = "" and  Field36 = "" and  Field37 = "" and  
> Field38 = "" and  Field39 = "" and  Field40 = "" and  Field41 = "" and  
> Field42 = "" and  Field43 = "" and  Field44 = "" and  Field45 = "" and  
> Field46 = "" and  Field47 = "" and  Field48 = "" and  Field49 = "" and  
> Field50 = "" and  Field51 = "" and  Field52 = "" and  Field53 = "" and  
> Field54 = "" and  Field55 = "" and  Field56 = "" and  Field57 = "" and  
> Field58 = "" and  Field59 = "" and  Field60 = "" and  Field61 = "" and  
> Field62 = "" and  Field63 = "" and  Field64 = "" and  Field65 = "" and  
> Field66 = "" and  Field67 = "" and  Field68 = "" and  Field69 = "" and  
> Field70 = "" and  Field71 = "" and  Field72 = "" and  Field73 = "" and  
> Field74 = "" and  Field75 = "" and  Field76 = "" and  Field77 = "" and  
> Field78 = "" and  Field79 = "" and  Field80 = "" and  Field81 = "" and  
> Field82 = "" and  Field83 = "" and  Field84 = "" and  Field85 = "" and  
> Field86 = "" and  Field87 = "" and  Field88 = "" and  Field89 = "" and  
> Field90 = "" and  Field91 = "" and  Field92 = "" and  Field93 = "" and  
> Field94 = "" and  Field95 = "" and  Field96 = "" and  Field97 = "" and  
> Field98 = "" and  Field99 = "" and  Field100 = "" and  Field101 = "" and  
> Field102 = "" and  Field103 = "" and  Field104 = "" and  Field105 = "" and  
> Field106 = "" and  Field107 = "" and  Field108 = "" and  Field109 = "" and  
> Field110 = "" and  Field111 = "" and  Field112 = "" and  Field113 = "" and  
> Field114 = "" and  Field115 = "" and  Field116 = "" and  Field117 = "" and  
> Field118 = "" and  Field119 = "" and  Field120 = "" and  Field121 = "" and  
> Field122 = "" and  Field123 = "" and  Field124 = "" and  Field125 = "" and  
> Field126 = "" and  Field127 = "" and  Field128 = "" and  Field129 = "" and  
> Field130 = &

[jira] [Comment Edited] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124982#comment-16124982
 ] 

Kazuaki Ishizaki edited comment on SPARK-19372 at 8/13/17 5:05 PM:
---

[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue? I will work for 
this.



was (Author: kiszk):
[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue?


> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>    Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124982#comment-16124982
 ] 

Kazuaki Ishizaki commented on SPARK-19372:
--

[~srinivasanm] I can reproduce this issue by using the master branch. I think 
that this is another problem.
Could you please create another JIRA entry to track this issue?


> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>    Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-08-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16124856#comment-16124856
 ] 

Kazuaki Ishizaki commented on SPARK-19372:
--

Thank you for letting us know the problem. I investigate this.

> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>    Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21276) Update lz4-java to remove custom LZ4BlockInputStream

2017-08-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16118402#comment-16118402
 ] 

Kazuaki Ishizaki commented on SPARK-21276:
--

Is it better to update affected version?

> Update  lz4-java to remove custom LZ4BlockInputStream
> -
>
> Key: SPARK-21276
> URL: https://issues.apache.org/jira/browse/SPARK-21276
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> We currently use custom LZ4BlockInputStream to read concatenated byte stream 
> in shuffle 
> (https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/io/LZ4BlockInputStream.java#L38).
>  In the recent pr (https://github.com/lz4/lz4-java/pull/105), this 
> functionality is implemented even in lz4-java upstream. So, we might update 
> the lz4-java package that will be released in near future.
> Issue about the next lz4-java release
> https://github.com/lz4/lz4-java/issues/98
> Diff between the latest release and the master in lz4-java
> https://github.com/lz4/lz4-java/compare/62f7547abb0819d1ca1e669645ee1a9d26cd60b0...6480bd9e06f92471bf400c16d4d5f3fd2afa3b3d
>  * fixed NPE in XXHashFactory similarly
>  * Don't place resources in default package to support shading
>  * Fixes ByteBuffer methods failing to apply arrayOffset() for array-backed
>  * Try to load lz4-java from java.library.path, then fallback to bundled
>  * Add ppc64le binary
>  * Add s390x JNI binding
>  * Add basic LZ4 Frame v1.5.0 support
>  * enable aarch64 support for lz4-java
>  * Allow unsafeInstance() for ppc64le archiecture
>  * Add unsafeInstance support for AArch64
>  * Support 64-bit JNI build on Solaris
>  * Avoid over-allocating a buffer
>  * Allow EndMark to be incompressible for LZ4FrameInputStream.
>  * Concat byte stream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: tuning - Spark data serialization for cache() ?

2017-08-08 Thread Kazuaki Ishizaki

For Dataframe (and Dataset) cache(), neither Java nor Kryo serialization 
is used. There is no way to use Java or Kryo serialization for 
DataFrame.cache() or Dataset.cache() for in-memory.
Are you talking about serialization to Disk?  In previous mail, I talked 
about only in-memory.

Regards, 
Kazuaki Ishizaki



From:   Ofir Manor <ofir.ma...@equalum.io>
To: Kazuaki Ishizaki <ishiz...@jp.ibm.com>
Cc: user <user@spark.apache.org>
Date:   2017/08/08 03:12
Subject:Re: tuning - Spark data serialization for cache() ?



Thanks a lot for the quick pointer!
So, is the advice I linked to in official Spark 2.2 documentation 
misleading? You are saying that Spark 2.2 does not use by Java 
serialization? And the tip to switch to Kyro is also outdated?

Ofir Manor
Co-Founder & CTO | Equalum
Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

On Mon, Aug 7, 2017 at 8:47 PM, Kazuaki Ishizaki <ishiz...@jp.ibm.com> 
wrote:
For Dataframe (and Dataset), cache() already uses fast 
serialization/deserialization with data compression schemes.

We already identified some performance issues regarding cache(). We are 
working for alleviating these issues in 
https://issues.apache.org/jira/browse/SPARK-14098.
We expect that these PRs will be integrated into Spark 2.3.

Kazuaki Ishizaki



From:Ofir Manor <ofir.ma...@equalum.io>
To:user <user@spark.apache.org>
Date:2017/08/08 02:04
Subject:tuning - Spark data serialization for cache() ?




Hi,
I'm using Spark 2.2, and have a big batch job, using dataframes (with 
built-in, basic types). It references the same intermediate dataframe 
multiple times, so I wanted to try to cache() that and see if it helps, 
both in memory footprint and performance.

Now, the Spark 2.2 tuning page (
http://spark.apache.org/docs/latest/tuning.html) clearly says:
1. The default Spark serialization is Java serialization.
2. It is recommended to switch to Kyro serialization.
3. "Since Spark 2.0.0, we internally use Kryo serializer when shuffling 
RDDs with simple types, arrays of simple types, or string type".

Now, I remember that in 2.0 launch, there were discussion of a third 
serialization format that is much more performant and compact. (Encoder?), 
but it is not referenced in the tuning guide and its Scala doc is not very 
clear to me. Specifically, Databricks shared some graphs etc of how much 
it is better than Kyro and Java serialization - see Encoders here:
https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html


So, is that relevant to cache()? If so, how can I enable it - and is it 
for MEMORY_AND_DISK_ONLY or MEMORY_AND_DISK_SER?

I tried to play with some other variations, like enabling Kyro by the 
tuning guide instructions, but didn't see any impact on the cached 
dataframe size (same tens of GBs in the UI). So any tips around that?

Thanks.
Ofir Manor
Co-Founder & CTO | Equalum
Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

Re: tuning - Spark data serialization for cache() ?

2017-08-07 Thread Kazuaki Ishizaki

For Dataframe (and Dataset), cache() already uses fast 
serialization/deserialization with data compression schemes.

We already identified some performance issues regarding cache(). We are 
working for alleviating these issues in 
https://issues.apache.org/jira/browse/SPARK-14098.
We expect that these PRs will be integrated into Spark 2.3.

Kazuaki Ishizaki



From:   Ofir Manor <ofir.ma...@equalum.io>
To: user <user@spark.apache.org>
Date:   2017/08/08 02:04
Subject:tuning - Spark data serialization for cache() ?



Hi,
I'm using Spark 2.2, and have a big batch job, using dataframes (with 
built-in, basic types). It references the same intermediate dataframe 
multiple times, so I wanted to try to cache() that and see if it helps, 
both in memory footprint and performance.

Now, the Spark 2.2 tuning page (
http://spark.apache.org/docs/latest/tuning.html) clearly says:
1. The default Spark serialization is Java serialization.
2. It is recommended to switch to Kyro serialization.
3. "Since Spark 2.0.0, we internally use Kryo serializer when shuffling 
RDDs with simple types, arrays of simple types, or string type".

Now, I remember that in 2.0 launch, there were discussion of a third 
serialization format that is much more performant and compact. (Encoder?), 
but it is not referenced in the tuning guide and its Scala doc is not very 
clear to me. Specifically, Databricks shared some graphs etc of how much 
it is better than Kyro and Java serialization - see Encoders here:
https://databricks.com/blog/2016/01/04/introducing-apache-spark-datasets.html

So, is that relevant to cache()? If so, how can I enable it - and is it 
for MEMORY_AND_DISK_ONLY or MEMORY_AND_DISK_SER?

I tried to play with some other variations, like enabling Kyro by the 
tuning guide instructions, but didn't see any impact on the cached 
dataframe size (same tens of GBs in the UI). So any tips around that?

Thanks.
Ofir Manor
Co-Founder & CTO | Equalum
Mobile: +972-54-7801286 | Email: ofir.ma...@equalum.io

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread Kazuaki Ishizaki

Congratulation, Hyukjin and Sameer, well deserved!!

Kazuaki Ishizaki

From:   Matei Zaharia <matei.zaha...@gmail.com>
To: dev <dev@spark.apache.org>
Date:   2017/08/08 00:53
Subject:Welcoming Hyukjin Kwon and Sameer Agarwal as committers

Hi everyone,

The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as 
committers. Join me in congratulating both of them and thanking them for 
their contributions to the project!

Matei
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-08-02 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110576#comment-16110576
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Thank you very much for pointing out the good JIRA entry. I will check it.

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21591) Implement treeAggregate on Dataset API

2017-08-01 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108644#comment-16108644
 ] 

Kazuaki Ishizaki commented on SPARK-21591:
--

I like this idea

> Implement treeAggregate on Dataset API
> --
>
> Key: SPARK-21591
> URL: https://issues.apache.org/jira/browse/SPARK-21591
> Project: Spark
>  Issue Type: Brainstorming
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yanbo Liang
>
> The Tungsten execution engine substantially improved the efficiency of memory 
> and CPU for Spark application. However, in MLlib we still not migrate the 
> internal computing workload from {{RDD}} to {{DataFrame}}.
> One of the block issue is there is no {{treeAggregate}} on {{DataFrame}}. 
> It's very important for MLlib algorithms, since they do aggregate on 
> {{Vector}} which may has millions of elements. As we all know, {{RDD}} based 
> {{treeAggregate}} reduces the aggregation time by an order of magnitude for  
> lots of MLlib 
> algorithms(https://databricks.com/blog/2014/09/22/spark-1-1-mllib-performance-improvements.html).
> I open this JIRA to discuss to implement {{treeAggregate}} on {{DataFrame}} 
> API and do the performance benchmark related issues. And I think other 
> scenarios except for MLlib will also benefit from this improvement if we get 
> it done.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-07-27 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104284#comment-16104284
 ] 

Kazuaki Ishizaki commented on SPARK-18016:
--

[~jamcon] Thank you reporting the problem.
We fixed a problem for the large number (e.g. 4000) of columns. However, we 
know that we have not solved a problem for the very large number (e.g. 12000) 
of columns.
I have just pinged the author that created the fix to solve these two problems.


> Code Generation: Constant Pool Past Limit for Wide/Nested Dataset
> -
>
> Key: SPARK-18016
> URL: https://issues.apache.org/jira/browse/SPARK-18016
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Aleksander Eskilson
>Assignee: Aleksander Eskilson
> Fix For: 2.3.0
>
>
> When attempting to encode collections of large Java objects to Datasets 
> having very wide or deeply nested schemas, code generation can fail, yielding:
> {code}
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool for 
> class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  has grown past JVM limit of 0x
>   at 
> org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:499)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439)
>   at 
> org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358)
>   at 
> org.codehaus.janino.UnitCompiler.writeConstantMethodrefInfo(UnitCompiler.java:4)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4547)
>   at org.codehaus.janino.UnitCompiler.access$7500(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3774)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitMethodInvocation(UnitCompiler.java:3762)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3762)
>   at 
> org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4933)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:3180)
>   at org.codehaus.janino.UnitCompiler.access$5000(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3151)
>   at 
> org.codehaus.janino.UnitCompiler$9.visitMethodInvocation(UnitCompiler.java:3139)
>   at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:4328)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3139)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2112)
>   at org.codehaus.janino.UnitCompiler.access$1700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1377)
>   at 
> org.codehaus.janino.UnitCompiler$6.visitExpressionStatement(UnitCompiler.java:1370)
>   at org.codehaus.janino.Java$ExpressionStatement.accept(Java.java:2558)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1370)
>   at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1450)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2811)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1262)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1234)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:538)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:890)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:894)
>   at org.codehaus.janino.UnitCompiler.access$600(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:377)
>   at 
> org.codehaus.janino.UnitCompiler$2.visitMemberClassDeclaration(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.Java$MemberClassDeclaration.accept(Java.java:1128)
>   at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:369)
>   at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMemberTypes(UnitCompiler.java:1209)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:564)
>   at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:420)
>   at org.codehaus.janino.UnitCompiler.access$400(UnitCompiler.java:206)
>   at 
>

[jira] [Commented] (SPARK-21496) Support codegen for TakeOrderedAndProjectExec

2017-07-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099840#comment-16099840
 ] 

Kazuaki Ishizaki commented on SPARK-21496:
--

Is there any good benchmark program for this?

> Support codegen for TakeOrderedAndProjectExec
> -
>
> Key: SPARK-21496
> URL: https://issues.apache.org/jira/browse/SPARK-21496
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jiang Xingbo
>Priority: Minor
>
> The operator `SortExec` supports codegen, but `TakeOrderedAndProjectExec` 
> doesn't. Perhaps we should also add codegen support for 
> `TakeOrderedAndProjectExec`, but we should also do benchmark for it carefully.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21517) Fetch local data via block manager cause oom

2017-07-25 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099654#comment-16099654
 ] 

Kazuaki Ishizaki commented on SPARK-21517:
--

Does it occur in Spark 2.2?

> Fetch local data via block manager cause oom
> 
>
> Key: SPARK-21517
> URL: https://issues.apache.org/jira/browse/SPARK-21517
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Affects Versions: 1.6.1, 2.1.0
>Reporter: zhoukang
>
> In our production cluster,oom happens when NettyBlockRpcServer receive 
> OpenBlocks message.The reason we observed is below:
> When BlockManagerManagedBuffer call ChunkedByteBuffer#toNetty, it will use 
> Unpooled.wrappedBuffer(ByteBuffer... buffers) which use default 
> maxNumComponents=16 in low-level CompositeByteBuf.When our component's number 
> is bigger than 16, it will execute during buffer copy.
> {code:java}
> private void consolidateIfNeeded() {
> int numComponents = this.components.size();
> if(numComponents > this.maxNumComponents) {
> int capacity = 
> ((CompositeByteBuf.Component)this.components.get(numComponents - 
> 1)).endOffset;
> ByteBuf consolidated = this.allocBuffer(capacity);
> for(int c = 0; c < numComponents; ++c) {
> CompositeByteBuf.Component c1 = 
> (CompositeByteBuf.Component)this.components.get(c);
> ByteBuf b = c1.buf;
> consolidated.writeBytes(b);
> c1.freeIfNecessary();
> }
> CompositeByteBuf.Component var7 = new 
> CompositeByteBuf.Component(consolidated);
> var7.endOffset = var7.length;
> this.components.clear();
> this.components.add(var7);
> }
> }
> {code}
> in CompositeByteBuf which will consume some memory during buffer copy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099351#comment-16099351
 ] 

Kazuaki Ishizaki commented on SPARK-21501:
--

I see. I misunderstood the description.
You expect that memory cache would be enabled even when # of entries is larger 
than {{spark.shuffle.service.index.cache.entries}} if the total cache size is 
not large.

> Spark shuffle index cache size should be memory based
> -
>
> Key: SPARK-21501
> URL: https://issues.apache.org/jira/browse/SPARK-21501
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: Thomas Graves
>
> Right now the spark shuffle service has a cache for index files. It is based 
> on a # of files cached (spark.shuffle.service.index.cache.entries). This can 
> cause issues if people have a lot of reducers because the size of each entry 
> can fluctuate based on the # of reducers. 
> We saw an issues with a job that had 17 reducers and it caused NM with 
> spark shuffle service to use 700-800MB or memory in NM by itself.
> We should change this cache to be memory based and only allow a certain 
> memory size used. When I say memory based I mean the cache should have a 
> limit of say 100MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21387.

Resolution: Cannot Reproduce

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki closed SPARK-21387.

Resolution: Fixed

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki reopened SPARK-21387:
--

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098545#comment-16098545
 ] 

Kazuaki Ishizaki commented on SPARK-21387:
--

While I got OOM in my unit test, I have to reinvestigate whether the unit test 
follows actual restrictions.

> org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM
> -
>
> Key: SPARK-21387
> URL: https://issues.apache.org/jira/browse/SPARK-21387
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21501) Spark shuffle index cache size should be memory based

2017-07-24 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098531#comment-16098531
 ] 

Kazuaki Ishizaki commented on SPARK-21501:
--

I guess that to use Spark 2.1 or later version alleviates this issue by 
SPARK-15074

> Spark shuffle index cache size should be memory based
> -
>
> Key: SPARK-21501
> URL: https://issues.apache.org/jira/browse/SPARK-21501
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.0.0
>Reporter: Thomas Graves
>
> Right now the spark shuffle service has a cache for index files. It is based 
> on a # of files cached (spark.shuffle.service.index.cache.entries). This can 
> cause issues if people have a lot of reducers because the size of each entry 
> can fluctuate based on the # of reducers. 
> We saw an issues with a job that had 17 reducers and it caused NM with 
> spark shuffle service to use 700-800MB or memory in NM by itself.
> We should change this cache to be memory based and only allow a certain 
> memory size used. When I say memory based I mean the cache should have a 
> limit of say 100MB.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21516) overriding afterEach() in DatasetCacheSuite must call super.afterEach()

2017-07-23 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21516:


 Summary: overriding afterEach() in DatasetCacheSuite must call 
super.afterEach()
 Key: SPARK-21516
 URL: https://issues.apache.org/jira/browse/SPARK-21516
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki


When we override {{afterEach()}} method in Testsuite, we have to call 
{{super.afterEach()}}. This is follow-up of SPARK-21512.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097565#comment-16097565
 ] 

Kazuaki Ishizaki edited comment on SPARK-21512 at 7/24/17 4:53 AM:
---

When {{DatasetCacheSuite}} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}


was (Author: kiszk):
When {DatasetCacheSuite} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097565#comment-16097565
 ] 

Kazuaki Ishizaki commented on SPARK-21512:
--

When {DatasetCacheSuite} is executed, the following warning messages appear. 
Unpersistent dataset is made persistent in the second test case {{"persist and 
then rebind right encoder when join 2 datasets"}} after the first test case 
{{"get storage level"}} made it persistent.
Thus, we run these test cases, the second case does not perform to make dataset 
persistent. This is because in 
 When we run only the second case, it performs to make dataset persistent. It 
is not good to change behavior of the second test suite. The first test case 
should correctly make dataset unpersistent.

{code}
01:52:48.595 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
01:52:48.692 WARN org.apache.spark.sql.execution.CacheManager: Asked to cache 
already cached data.
{code}

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21512) DatasetCacheSuite needs to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21512:
-
Summary: DatasetCacheSuite needs to execute unpersistent after executing 
peristent  (was: DatasetCacheSuites need to execute unpersistent after 
executing peristent)

> DatasetCacheSuite needs to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21512) DatasetCacheSuites need to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21512:
-
Summary: DatasetCacheSuites need to execute unpersistent after executing 
peristent  (was: DatasetCacheSuite need to execute unpersistent after executing 
peristent)

> DatasetCacheSuites need to execute unpersistent after executing peristent
> -
>
> Key: SPARK-21512
> URL: https://issues.apache.org/jira/browse/SPARK-21512
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21512) DatasetCacheSuite need to execute unpersistent after executing peristent

2017-07-23 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21512:


 Summary: DatasetCacheSuite need to execute unpersistent after 
executing peristent
 Key: SPARK-21512
 URL: https://issues.apache.org/jira/browse/SPARK-21512
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20822) Generate code to get value from ColumnVector in ColumnarBatch

2017-07-20 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-20822:
-
Summary: Generate code to get value from ColumnVector in ColumnarBatch  
(was: Generate code to build table cache using ColumnarBatch and to get value 
from ColumnVector)

> Generate code to get value from ColumnVector in ColumnarBatch
> -
>
> Key: SPARK-20822
> URL: https://issues.apache.org/jira/browse/SPARK-20822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20822) Generate code to get value from CachedBatchColumnVector in ColumnarBatch

2017-07-20 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-20822:
-
Summary: Generate code to get value from CachedBatchColumnVector in 
ColumnarBatch  (was: Generate code to get value from ColumnVector in 
ColumnarBatch)

> Generate code to get value from CachedBatchColumnVector in ColumnarBatch
> 
>
> Key: SPARK-20822
> URL: https://issues.apache.org/jira/browse/SPARK-20822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21443) Very long planning duration for queries with lots of operations

2017-07-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090242#comment-16090242
 ] 

Kazuaki Ishizaki commented on SPARK-21443:
--

These two optimizations {{InferFiltersFromConstraints}} and {{PruneFiltersare}} 
known as time-consuming optimizations.

Since It is not easy to fix to fix the root cause, Spark community introduced 
an option {{spark.sql.constraintPropagation.enabled}} to disable these 
optimization by [this PR|https://github.com/apache/spark/pull/17186].
Is it possible to alleviate the problem by using this option?

> Very long planning duration for queries with lots of operations
> ---
>
> Key: SPARK-21443
> URL: https://issues.apache.org/jira/browse/SPARK-21443
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Eyal Zituny
>
> Creating a streaming query with large amount of operations and fields (100+) 
> results in a very long query planning phase. in the example bellow, the plan 
> phase has taken 35 seconds while the actual batch execution took only 1.3 
> second.
> after some investigation, i have found out that the root causes of this are 2 
> optimizer rules which seems to take most of the planning time: 
> InferFiltersFromConstraints and PruneFilters
> I would suggest the following:
> # fix the inefficient optimizer rules
> # add warn level logging if a rule has taken more then xx ms
> # allow custom removing of optimizer rules (opposite to 
> spark.experimental.extraOptimizations)
> # reuse query plans (optional) where possible
> reproducing this issue can be done with the bellow script which simulates the 
> scenario:
> {code:java}
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.execution.streaming.MemoryStream
> import 
> org.apache.spark.sql.streaming.StreamingQueryListener.{QueryProgressEvent, 
> QueryStartedEvent, QueryTerminatedEvent}
> import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQueryListener}
> case class Product(pid: Long, name: String, price: Long, ts: Long = 
> System.currentTimeMillis())
> case class Events (eventId: Long, eventName: String, productId: Long) {
>   def this(id: Long) = this(id, s"event$id", id%100)
> }
> object SparkTestFlow {
>   def main(args: Array[String]): Unit = {
>   val spark = SparkSession
> .builder
> .appName("TestFlow")
> .master("local[8]")
> .getOrCreate()
>   spark.sqlContext.streams.addListener(new StreamingQueryListener 
> {
>   override def onQueryTerminated(event: 
> QueryTerminatedEvent): Unit = {}
>   override def onQueryProgress(event: 
> QueryProgressEvent): Unit = {
>   if (event.progress.numInputRows>0) {
>   println(event.progress.toString())
>   }
>   }
>   override def onQueryStarted(event: QueryStartedEvent): 
> Unit = {}
>   })
>   
>   import spark.implicits._
>   implicit val  sclContext = spark.sqlContext
>   import org.apache.spark.sql.functions.expr
>   val seq = (1L to 100L).map(i => Product(i, s"name$i", 10L*i))
>   val lookupTable = spark.createDataFrame(seq)
>   val inputData = MemoryStream[Events]
>   inputData.addData((1L to 100L).map(i => new Events(i)))
>   val events = inputData.toDF()
> .withColumn("w1", expr("0"))
> .withColumn("x1", expr("0"))
> .withColumn("y1", expr("0"))
> .withColumn("z1", expr("0"))
>   val numberOfSelects = 40 // set to 100+ and the planning takes 
> forever
>   val dfWithSelectsExpr = (2 to 
> numberOfSelects).foldLeft(events)((df,i) =>{
>   val arr = df.columns.++(Array(s"w${i-1} + rand() as 
> w$i", s"x${i-1} + rand() as x$i", s"y${i-1} + 2 as y$i", s"z${i-1} +1 as 
> z$i"))
>   df.selectExpr(arr:_*)
>   })
>   val withJoinAndFilter = dfWithSelectsExpr
> .join(lookupTable, expr("productId = pid"))
> .filter("productId < 50")
>

[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1

2017-07-17 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089915#comment-16089915
 ] 

Kazuaki Ishizaki commented on SPARK-21415:
--

I see. When another JIRA will happen for these triage scapegoat warnings, we 
could make them umbrella.

> Triage scapegoat warnings, part 1
> -
>
> Key: SPARK-21415
> URL: https://issues.apache.org/jira/browse/SPARK-21415
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Following the results of the scapegoat plugin at 
> https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040
>  and some initial triage, I'd like to address all of the valid instances of 
> some classes of warning:
> - BigDecimal double constructor
> - Catching NPE
> - Finalizer without super
> - List.size is O(n)
> - Prefer Seq.empty
> - Prefer Set.empty
> - reverse.map instead of reverseMap
> - Type shadowing
> - Unnecessary if condition.
> - Use .log1p
> - Var could be val



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089012#comment-16089012
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

cc: [~ueshin] Is there any thought on this?

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088977#comment-16088977
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

When I ran the following test suite in ReplSuite.scala, I got the assertion 
error at the last assertion.

{code:java}
  test("SPARK-21390: incorrect filter with case class") {
val output = runInterpreter("local",
  """
|case class SomeClass(f1: String, f2: String)
|val ds = Seq(SomeClass("a", "b")).toDS
|val filterCond = Seq(SomeClass("a", "b"))
|ds.filter(x => filterCond.contains(SomeClass(x.f1, x.f2))).show
  """.stripMargin)
print(s"$output\n")
assertDoesNotContain("error:", output)
assertDoesNotContain("Exception", output)
assertContains("|  a| b|", output)
  }
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-16 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088659#comment-16088659
 ] 

Kazuaki Ishizaki edited comment on SPARK-21418 at 7/16/17 8:25 AM:
---

I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
{{toString}} method. Do you specify some option to run this program for JVM?


was (Author: kiszk):
I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStr

[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088660#comment-16088660
 ] 

Kazuaki Ishizaki edited comment on SPARK-21393 at 7/15/17 4:53 PM:
---

I confirmed that this python program works well without an exception after 
applying a PR for SPARK-21413.


was (Author: kiszk):
I confirmed that this python program works well after applying a PR for 
SPARK-21413.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088660#comment-16088660
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

I confirmed that this python program works well after applying a PR for 
SPARK-21413.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
>

[jira] [Comment Edited] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088659#comment-16088659
 ] 

Kazuaki Ishizaki edited comment on SPARK-21418 at 7/15/17 4:43 PM:
---

I am curious why {{java.io.ObjectOutputStream.writeOrdinaryObject}} calls 
`toString` method. Do you specify some option to run this program for JVM?


was (Author: kiszk):
I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStr

[jira] [Commented] (SPARK-21418) NoSuchElementException: None.get on DataFrame.rdd

2017-07-15 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16088659#comment-16088659
 ] 

Kazuaki Ishizaki commented on SPARK-21418:
--

I am curious why {java.io.ObjectOutputStream.writeOrdinaryObject} calls 
`toString` method. Do you specify some option to run this program for JVM?

> NoSuchElementException: None.get on DataFrame.rdd
> -
>
> Key: SPARK-21418
> URL: https://issues.apache.org/jira/browse/SPARK-21418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Daniel Darabos
>
> I don't have a minimal reproducible example yet, sorry. I have the following 
> lines in a unit test for our Spark application:
> {code}
> val df = mySparkSession.read.format("jdbc")
>   .options(Map("url" -> url, "dbtable" -> "test_table"))
>   .load()
> df.show
> println(df.rdd.collect)
> {code}
> The output shows the DataFrame contents from {{df.show}}. But the {{collect}} 
> fails:
> {noformat}
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> serialization failed: java.util.NoSuchElementException: None.get
> java.util.NoSuchElementException: None.get
>   at scala.None$.get(Option.scala:347)
>   at scala.None$.get(Option.scala:345)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.org$apache$spark$sql$execution$DataSourceScanExec$$redact(DataSourceScanExec.scala:70)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:54)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$$anonfun$4.apply(DataSourceScanExec.scala:52)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.simpleString(DataSourceScanExec.scala:52)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.simpleString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.verboseString(QueryPlan.scala:349)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.org$apache$spark$sql$execution$DataSourceScanExec$$super$verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.execution.DataSourceScanExec$class.verboseString(DataSourceScanExec.scala:60)
>   at 
> org.apache.spark.sql.execution.RowDataSourceScanExec.verboseString(DataSourceScanExec.scala:75)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:556)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.generateTreeString(WholeStageCodegenExec.scala:451)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.generateTreeString(TreeNode.scala:576)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:480)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.treeString(TreeNode.scala:477)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toString(TreeNode.scala:474)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1421)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
>   at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
>   at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
>   at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
>   at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.jav

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087391#comment-16087391
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

Not yet, however I created a patch not to cause failure for a program in 
SPARK-21413.
I will submit a pull request when I can create a test suite for this patch. 
Then, I expect that it will be merged into the master.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.imm

[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1

2017-07-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16087097#comment-16087097
 ] 

Kazuaki Ishizaki commented on SPARK-21415:
--

Thank you. Is it better to create an umbrella JIRA entry for Triage scapegoat 
works?

> Triage scapegoat warnings, part 1
> -
>
> Key: SPARK-21415
> URL: https://issues.apache.org/jira/browse/SPARK-21415
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, ML, Spark Core, SQL, Structured Streaming
>Affects Versions: 2.3.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> Following the results of the scapegoat plugin at 
> https://docs.google.com/spreadsheets/d/1z7xNMjx7VCJLCiHOHhTth7Hh4R0F6LwcGjEwCDzrCiM/edit#gid=767668040
>  and some initial triage, I'd like to address all of the valid instances of 
> some classes of warning:
> - BigDecimal double constructor
> - Catching NPE
> - Finalizer without super
> - List.size is O(n)
> - Prefer Seq.empty
> - Prefer Set.empty
> - reverse.map instead of reverseMap
> - Type shadowing
> - Unnecessary if condition.
> - Use .log1p
> - Var could be val



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21413) Multiple projections with CASE WHEN fails to run generated codes

2017-07-14 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086927#comment-16086927
 ] 

Kazuaki Ishizaki commented on SPARK-21413:
--

Thank you for preparing a good repro. I can reproduce this problem. I think 
that this can cause the same problem as SPARK-21393.
I am working for this.

> Multiple projections with CASE WHEN fails to run generated codes
> 
>
> Key: SPARK-21413
> URL: https://issues.apache.org/jira/browse/SPARK-21413
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>
> Scala codes to reproduce are as below:
> {code}
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> val schema = StructType(StructField("fieldA", IntegerType) :: Nil)
> var df = spark.createDataFrame(spark.sparkContext.parallelize(Seq(Row(1))), 
> schema)
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}
> Calling {{explain()}} on the dataframe in the former case shows a huge 
> case-when projection and {{show()}} fails with the exception as below:
> {code}
> ...
> Caused by: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
>   at org.codehaus.janino.CodeContext.makeSpace(CodeContext.java:949)
>   at org.codehaus.janino.CodeContext.write(CodeContext.java:839)
>   at org.codehaus.janino.UnitCompiler.writeOpcode(UnitCompiler.java:11081)
>   at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:9674)
>   at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4911)
>   at org.codehaus.janino.UnitCompiler.access$7700(UnitCompiler.java:206)
>   at 
> org.codehaus.janino.UnitCompiler$12.visitIntegerLiteral(UnitCompiler.java:3776)
> ...
> {code}
> Note that, I could not reproduce this with local relation (this one appears 
> by {{ConvertToLocalRelation}}).
> {code}
> import org.apache.spark.sql.functions._
> var df = Seq(1).toDF("fieldA")
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df = df.withColumn("fieldA", when($"fieldA" === 0, null).otherwise($"fieldA"))
> df.show()
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086254#comment-16086254
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

This program can cause the same exception

{code}
from __future__ import absolute_import, division, print_function

import findspark
findspark.init()
import pyspark
from pyspark.sql.functions import *

from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
import pyspark.sql.functions as sf

sc = SparkContext()
sqlContext = SQLContext(sc)
### data
df = sqlContext.read.load('./Data/claims.csv', 
format='com.databricks.spark.csv', header=True)

df_new = df.withColumn('service_type_col',sf.when((sf.col('RevenueCategory') == 
"Emergency Room") | (sf.col('CPT_Name') == "EMERGENCY DEPT VISIT"), 
'EMERGENCY_CARE').otherwise(0))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('ProcedureCategory').isin([ "Laboratory, General"])) & 
(sf.col('service_type_col') == 0), 
'LAB_AND_PATHOLOGY').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'ROUTINE_RADIOLOGY').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('CPT_Code').isin(["70336"])) & (sf.col('service_type_col') == 
0), 'ADVANCED_IMAGING').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'DURABLE_MEDICAL_EQUIPMENT').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('CPT_Name').isin(['CHIROPRACTIC MANIPULATION'])) & 
(sf.col('service_type_col') == 0), 
'CHIROPRACTIC').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'AMBULANCE').otherwise(df_new.service_type_col))
df_new = df_new.withColumn('service_type_col', 
sf.when((sf.col('service_type_col') == 0), 
'RX_MAIL').otherwise(df_new.service_type_col))

df_new.show()
{code}

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029

[jira] [Comment Edited] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086189#comment-16086189
 ] 

Kazuaki Ishizaki edited comment on SPARK-21393 at 7/13/17 6:39 PM:
---

Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the number of lines of this program.


was (Author: kiszk):
Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the program.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16086189#comment-16086189
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

Thank you for uploading files. When I insert {df_new.show()} at appropriate 
places, I can reproduce this problem on Spark 2.1.1 or Spark 2.2.
I am reducing the program.

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.

[jira] [Updated] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21393:
-
Affects Version/s: 2.2.0

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1, 2.2.0
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: Data.zip, working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set set27;
> /* 060 */   private UTF8String.IntWrapper wrapper24;
> /* 061 */   private UTF

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085939#comment-16085939
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

I created [a PR|https://github.com/apache/spark/pull/18626] to solve this 
problem in Spark 2.1

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21390:
-
Affects Version/s: 2.1.0

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21390) Dataset filter api inconsistency

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21390:
-
Affects Version/s: 2.2.0

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.1.0, 2.2.0
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21393) spark (pyspark) crashes unpredictably when using show() or toPandas()

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085726#comment-16085726
 ] 

Kazuaki Ishizaki commented on SPARK-21393:
--

This program seems to require 7 csv files to execute this program. Could you 
please attached these csv files?

> spark (pyspark) crashes unpredictably when using show() or toPandas()
> -
>
> Key: SPARK-21393
> URL: https://issues.apache.org/jira/browse/SPARK-21393
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.1
> Environment: Windows 10
> python 2.7
>Reporter: Zahra
> Attachments: working_ST_pyspark.py
>
>
> unpredictbly run into this error either when using 
> `pyspark.sql.DataFrame.show()` or `pyspark.sql.DataFrame.toPandas()`
> error log starts with  (truncated) :
> {noformat}
> 17/07/12 16:03:09 ERROR CodeGenerator: failed to compile: 
> org.codehaus.janino.JaninoRuntimeException: Code of method 
> "apply_47$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection"
>  grows beyond 64 KB
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private scala.collection.immutable.Set set;
> /* 009 */   private scala.collection.immutable.Set set1;
> /* 010 */   private scala.collection.immutable.Set set2;
> /* 011 */   private scala.collection.immutable.Set set3;
> /* 012 */   private UTF8String.IntWrapper wrapper;
> /* 013 */   private UTF8String.IntWrapper wrapper1;
> /* 014 */   private scala.collection.immutable.Set set4;
> /* 015 */   private UTF8String.IntWrapper wrapper2;
> /* 016 */   private UTF8String.IntWrapper wrapper3;
> /* 017 */   private scala.collection.immutable.Set set5;
> /* 018 */   private scala.collection.immutable.Set set6;
> /* 019 */   private scala.collection.immutable.Set set7;
> /* 020 */   private UTF8String.IntWrapper wrapper4;
> /* 021 */   private UTF8String.IntWrapper wrapper5;
> /* 022 */   private scala.collection.immutable.Set set8;
> /* 023 */   private UTF8String.IntWrapper wrapper6;
> /* 024 */   private UTF8String.IntWrapper wrapper7;
> /* 025 */   private scala.collection.immutable.Set set9;
> /* 026 */   private scala.collection.immutable.Set set10;
> /* 027 */   private scala.collection.immutable.Set set11;
> /* 028 */   private UTF8String.IntWrapper wrapper8;
> /* 029 */   private UTF8String.IntWrapper wrapper9;
> /* 030 */   private scala.collection.immutable.Set set12;
> /* 031 */   private UTF8String.IntWrapper wrapper10;
> /* 032 */   private UTF8String.IntWrapper wrapper11;
> /* 033 */   private scala.collection.immutable.Set set13;
> /* 034 */   private scala.collection.immutable.Set set14;
> /* 035 */   private scala.collection.immutable.Set set15;
> /* 036 */   private UTF8String.IntWrapper wrapper12;
> /* 037 */   private UTF8String.IntWrapper wrapper13;
> /* 038 */   private scala.collection.immutable.Set set16;
> /* 039 */   private UTF8String.IntWrapper wrapper14;
> /* 040 */   private UTF8String.IntWrapper wrapper15;
> /* 041 */   private scala.collection.immutable.Set set17;
> /* 042 */   private scala.collection.immutable.Set set18;
> /* 043 */   private scala.collection.immutable.Set set19;
> /* 044 */   private UTF8String.IntWrapper wrapper16;
> /* 045 */   private UTF8String.IntWrapper wrapper17;
> /* 046 */   private scala.collection.immutable.Set set20;
> /* 047 */   private UTF8String.IntWrapper wrapper18;
> /* 048 */   private UTF8String.IntWrapper wrapper19;
> /* 049 */   private scala.collection.immutable.Set set21;
> /* 050 */   private scala.collection.immutable.Set set22;
> /* 051 */   private scala.collection.immutable.Set set23;
> /* 052 */   private UTF8String.IntWrapper wrapper20;
> /* 053 */   private UTF8String.IntWrapper wrapper21;
> /* 054 */   private scala.collection.immutable.Set set24;
> /* 055 */   private UTF8String.IntWrapper wrapper22;
> /* 056 */   private UTF8String.IntWrapper wrapper23;
> /* 057 */   private scala.collection.immutable.Set set25;
> /* 058 */   private scala.collection.immutable.Set set26;
> /* 059 */   private scala.collection.immutable.Set

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-13 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085342#comment-16085342
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

[~neelrr] Do you want to have fix in future release of 2.1? If so, I will make 
a PR for this backport.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
> /* 054 */
> /* 055 */   private void evalIfTrueExpr(InternalRow i) {
> /* 056

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085099#comment-16085099
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/13/17 3:42 AM:
---

[~hyukjin.kwon] I think that SPARK-19254 and/or SPARK-19104 fixed this issue.


was (Author: kiszk):
[~hyukjin.kwon] I think that 
[SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or 
[SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this 
issue.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16085099#comment-16085099
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

[~hyukjin.kwon] I think that 
[SPARK-19254|https://issues.apache.org/jira/browse/SPARK-19254] and/or 
[SPARK-19104|https://issues.apache.org/jira/browse/SPARK-19104] fixed this 
issue.

> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 051 */
> /* 052 */   }
> /* 053 */
>

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084333#comment-16084333
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM:
---

This program works with the master or Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}



was (Author: kiszk):
This program works with the master and Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.

[jira] [Comment Edited] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084333#comment-16084333
 ] 

Kazuaki Ishizaki edited comment on SPARK-21391 at 7/12/17 5:19 PM:
---

This program works with the master and Spark 2.2. Would it be possible to use 
Spark 2.2?

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}



was (Author: kiszk):
This program works with the master.

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.cat

[jira] [Commented] (SPARK-21391) Cannot convert a Seq of Map whose value type is again a seq, into a dataset

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084333#comment-16084333
 ] 

Kazuaki Ishizaki commented on SPARK-21391:
--

This program works with the master.

{code}
++
|  properties|
++
|Map(A1 -> [Wrappe...|
|Map(A2 -> [Wrappe...|
++
{code}


> Cannot convert a Seq of Map whose value type is again a seq, into a dataset 
> 
>
> Key: SPARK-21391
> URL: https://issues.apache.org/jira/browse/SPARK-21391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Seen on mac OSX, scala 2.11, java 8
>Reporter: indraneel rao
>
> There is an error while trying to create a dataset from a sequence of Maps, 
> whose values have any kind of collections. Even when they are wrapped in a 
> case class. 
> Eg : The following piece of code throws an error:
>
> {code:java}
> case class Values(values: Seq[Double])
> case class ItemProperties(properties:Map[String,Values])
> val l1 = List(ItemProperties(
>   Map(
> "A1" -> Values(Seq(1.0,2.0)),
> "B1" -> Values(Seq(44.0,55.0))
>   )
> ),
>   ItemProperties(
> Map(
>   "A2" -> Values(Seq(123.0,25.0)),
>   "B2" -> Values(Seq(445.0,35.0))
> )
>   )
> )
> l1.toDS().show()
> {code}
> Heres the error:
> 17/07/12 21:59:35 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 65, Column 46: Expression "ExternalMapToCatalyst_value_isNull0" is not an 
> rvalue
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificUnsafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificUnsafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.UnsafeProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private boolean resultIsNull;
> /* 009 */   private java.lang.String argValue;
> /* 010 */   private Object[] values;
> /* 011 */   private boolean resultIsNull1;
> /* 012 */   private scala.collection.Seq argValue1;
> /* 013 */   private boolean isNull11;
> /* 014 */   private boolean value11;
> /* 015 */   private boolean isNull12;
> /* 016 */   private InternalRow value12;
> /* 017 */   private boolean isNull13;
> /* 018 */   private InternalRow value13;
> /* 019 */   private UnsafeRow result;
> /* 020 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder holder;
> /* 021 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter;
> /* 022 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter;
> /* 023 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter1;
> /* 024 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter rowWriter1;
> /* 025 */   private 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter 
> arrayWriter2;
> /* 026 */
> /* 027 */   public SpecificUnsafeProjection(Object[] references) {
> /* 028 */ this.references = references;
> /* 029 */
> /* 030 */
> /* 031 */ this.values = null;
> /* 032 */
> /* 033 */
> /* 034 */ isNull11 = false;
> /* 035 */ value11 = false;
> /* 036 */ isNull12 = false;
> /* 037 */ value12 = null;
> /* 038 */ isNull13 = false;
> /* 039 */ value13 = null;
> /* 040 */ result = new UnsafeRow(1);
> /* 041 */ this.holder = new 
> org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder(result, 32);
> /* 042 */ this.rowWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 043 */ this.arrayWriter = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 044 */ this.arrayWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 045 */ this.rowWriter1 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter(holder, 1);
> /* 046 */ this.arrayWriter2 = new 
> org.apache.spark.sql.catalyst.expressions.codegen.UnsafeArrayWriter();
> /* 047 */
> /* 048 */   }
> /* 049 */
> /* 050 */   public void initialize(int partitionIndex) {
> /* 0

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084306#comment-16084306
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Another interesting results with Spark-2.2:
On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084306#comment-16084306
 ] 

Kazuaki Ishizaki edited comment on SPARK-21390 at 7/12/17 5:09 PM:
---

Another interesting results with Spark-2.2. Is this only for CaseClass on REPL?

On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> Seq((0, 0), (1, 1), (2, 2)).toDS.filter(x => { val c = Seq((1, 
1)).contains((x._1, x._2)); print(s"$c\n"); c} ).show
false
true
false
+---+---+
| _1| _2|
+---+---+
|  1|  1|
+---+---+
{code}


was (Author: kiszk):
Another interesting results with Spark-2.2:
On IDE
{code:java}
{
...
filterMe1.filter(x=> filterCondition.contains(x)).show
filterMe1.filter(x=> filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
}

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+
{code}

On REPL
{code:java}
...
scala> filterMe1.filter(x => filterCondition.contains(x)).show
+--+--+
|field1|field2|
+--+--+
|00|01|
+--+--+

scala> filterMe1.filter(x => filterCondition.contains(SomeClass(x.field1, 
x.field2))).show
+--+--+
|field1|field2|
+--+--+
+--+--+

scala> print(filterCondition.contains(SomeClass("00", "01")))
true

scala> filterMe1.filter(x => { val c = 
filterCondition.contains(SomeClass(x.field1, x.field2)); print(s"$c\n"); c} 
).show
false
+--+--+
|field1|field2|
+--+--+
+--+--+
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21390) Dataset filter api inconsistency

2017-07-12 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084266#comment-16084266
 ] 

Kazuaki Ishizaki commented on SPARK-21390:
--

Thank you for reporting this. I can reproduce this using Spark 2.2, too.

{code:java}
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
  /_/
 
Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Dataset

scala> case class SomeClass(field1:String, field2:String)
defined class SomeClass

scala> val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
filterCondition: Seq[SomeClass] = List(SomeClass(00,01))

scala> val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
filterMe1: org.apache.spark.sql.Dataset[SomeClass] = [field1: string, field2: 
string]

scala> println("Works fine!" 
+filterMe1.filter(filterCondition.contains(_)).count)
Works fine!1

scala> case class OtherClass(field1:String, field2:String)
defined class OtherClass

scala> val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
filterMe2: org.apache.spark.sql.Dataset[OtherClass] = [field1: string, field2: 
string]

scala> println("Fail, count should return 1: " + filterMe2.filter(x=> 
filterCondition.contains(SomeClass(x.field1, x.field2))).count)
Fail, count should return 1: 0
{code}

> Dataset filter api inconsistency
> 
>
> Key: SPARK-21390
> URL: https://issues.apache.org/jira/browse/SPARK-21390
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Gheorghe Gheorghe
>Priority: Minor
>
> Hello everybody, 
> I've encountered a strange situation with the spark-shell.
> When I run the code below in my IDE the second test case prints as expected 
> count "1". However, when I run the same code using the spark-shell in the 
> second test case I get 0 back as a count. 
> I've made sure that I'm running scala 2.11.8 and spark 2.0.1 in both my IDE 
> and spark-shell. 
> {code:java}
>   import org.apache.spark.sql.Dataset
>   case class SomeClass(field1:String, field2:String)
>   val filterCondition: Seq[SomeClass] = Seq( SomeClass("00", "01") )
>   // Test 1
>   val filterMe1: Dataset[SomeClass] = Seq( SomeClass("00", "01") ).toDS
>   
>   println("Works fine!" +filterMe1.filter(filterCondition.contains(_)).count)
>   
>   // Test 2
>   case class OtherClass(field1:String, field2:String)
>   
>   val filterMe2 = Seq( OtherClass("00", "01"), OtherClass("00", "02")).toDS
>   println("Fail, count should return 1: " + filterMe2.filter(x=> 
> filterCondition.contains(SomeClass(x.field1, x.field2))).count)
> {code}
> Note if I transform the dataset first I get 1 back as expected.
> {code:java}
>  println(filterMe2.map(x=> SomeClass(x.field1, 
> x.field2)).filter(filterCondition.contains(_)).count)
> {code}
> Is this a bug? I can see that this filter function has been marked as 
> experimental 
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Dataset.html#filter(scala.Function1)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21387) org.apache.spark.memory.TaskMemoryManager.allocatePage causes OOM

2017-07-12 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21387:


 Summary: org.apache.spark.memory.TaskMemoryManager.allocatePage 
causes OOM
 Key: SPARK-21387
 URL: https://issues.apache.org/jira/browse/SPARK-21387
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Kazuaki Ishizaki






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082426#comment-16082426
 ] 

Kazuaki Ishizaki edited comment on SPARK-21373 at 7/11/17 3:56 PM:
---

Since I have not clarified, I changed the title.


was (Author: kiszk):
Since I have not clarified, I changed the title. I will submit a PR for 
improvement.

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21373:
-
Summary: Update Jetty to 9.3.20.v20170531  (was: Update Jetty to 
9.3.20.v20170531 to fix CVE-2017-9735)

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21373) Update Jetty to 9.3.20.v20170531

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16082426#comment-16082426
 ] 

Kazuaki Ishizaki commented on SPARK-21373:
--

Since I have not clarified, I changed the title. I will submit a PR for 
improvement.

> Update Jetty to 9.3.20.v20170531
> 
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Kazuaki Ishizaki
>Priority: Minor
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735

2017-07-11 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-21373:


 Summary: Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
 Key: SPARK-21373
 URL: https://issues.apache.org/jira/browse/SPARK-21373
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1, 2.3.0
Reporter: Kazuaki Ishizaki


This is derived from https://issues.apache.org/jira/browse/FELIX-5664

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21373) Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735

2017-07-11 Thread Kazuaki Ishizaki (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Ishizaki updated SPARK-21373:
-
Description: 
This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
[~aroberts] let me know the CVE.

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.

  was:
This is derived from https://issues.apache.org/jira/browse/FELIX-5664

Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
* https://nvd.nist.gov/vuln/detail/CVE-2017-9735
* https://github.com/eclipse/jetty.project/issues/1556

We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.


> Update Jetty to 9.3.20.v20170531 to fix CVE-2017-9735
> -
>
> Key: SPARK-21373
> URL: https://issues.apache.org/jira/browse/SPARK-21373
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Kazuaki Ishizaki
>
> This is derived from https://issues.apache.org/jira/browse/FELIX-5664. 
> [~aroberts] let me know the CVE.
> Spark 2.2 uses jetty 9.3.11.v20160721 that is sensitive to CVE-2017-9735
> * https://nvd.nist.gov/vuln/detail/CVE-2017-9735
> * https://github.com/eclipse/jetty.project/issues/1556
> We should upgrade jetty to 9.3.20.v20170531 that has released to fix the CVE.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21364) IndexOutOfBoundsException on equality check of two complex array elements

2017-07-10 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080963#comment-16080963
 ] 

Kazuaki Ishizaki commented on SPARK-21364:
--

When I ran the following test case that is derived from the repro, I succeeded 
to get the result without any exception on the master or 2.1.1.
Do I make some mistakes?

{code}
  test("SPARK-21364") {
val data = Seq(
  "{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[" 
+
"{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"}," +
"{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"}, " +
"{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}" +
"]}}}")
val df = sqlContext.read.json(sparkContext.parallelize(data))
df.select($"menu.popup.menuitem"(lit(0)). === 
($"menu.popup.menuitem"(lit(1.show
  }
{code}

{code}
+-+
|(menu.popup.menuitem[0] = menu.popup.menuitem[1])|
+-+
|false|
+-+
{code}

> IndexOutOfBoundsException on equality check of two complex array elements
> -
>
> Key: SPARK-21364
> URL: https://issues.apache.org/jira/browse/SPARK-21364
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Vivek Patangiwar
>Priority: Minor
>
> Getting an IndexOutOfBoundsException with the following code:
> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.SparkSession
> object ArrayEqualityTest {
>   def main(s:Array[String]) {
> val sparkSession = 
> SparkSession.builder().master("local[*]").appName("app").getOrCreate()
> val sqlContext = sparkSession.sqlContext
> val sc = sparkSession.sqlContext.sparkContext
> import sparkSession.implicits._
> val df = 
> sqlContext.read.json(sc.parallelize(Seq("{\"menu\":{\"id\":\"file\",\"value\":\"File\",\"popup\":{\"menuitem\":[{\"value\":\"New\",\"onclick\":\"CreateNewDoc()\"},{\"value\":\"Open\",\"onclick\":\"OpenDoc()\"},{\"value\":\"Close\",\"onclick\":\"CloseDoc()\"}]}}}")))
> 
> df.select($"menu.popup.menuitem"(lit(0)).===($"menu.popup.menuitem"(lit(1.show
>   }
> }
> Here's the complete stack-trace:
> Exception in thread "main" java.lang.IndexOutOfBoundsException: 1
>   at 
> scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.List.apply(List.scala:84)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:64)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:101)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:101)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:76)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$$anonfun$3.apply(GenerateOrdering.scala:75)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:75)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.GenerateOrdering$.genComparisons(GenerateOrdering.scala:68)
>

[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2017-07-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079032#comment-16079032
 ] 

Kazuaki Ishizaki commented on SPARK-21337:
--

I cannot reproduce this by using the latest or v2.1 tag in branch-2.1, too.
Is this issue only for CDH?

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
> Fix For: 2.1.1
>
>
> when there are large 'case when ' expressions in spark sql,the CodeGenerator 
> failed to compile it. 
> Error message is followed by a huge dump of generated source code,at last 
> failed.
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V
>  of class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  grows beyond 64 KB.
> It seems that SPARK-13242 has solved this problem in spark-1.6.2,however it  
> apparence in spark-2.1.1 again. 
> https://issues.apache.org/jira/browse/SPARK-13242.
> is there something wrong ? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21344) BinaryType comparison does signed byte array comparison

2017-07-08 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079010#comment-16079010
 ] 

Kazuaki Ishizaki commented on SPARK-21344:
--

I will work for this if anyone has finished a PR.

> BinaryType comparison does signed byte array comparison
> ---
>
> Key: SPARK-21344
> URL: https://issues.apache.org/jira/browse/SPARK-21344
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.1
>Reporter: Shubham Chopra
>
> BinaryType used by Spark SQL defines ordering using signed byte comparisons. 
> This can lead to unexpected behavior. Consider the following code snippet 
> that shows this error:
> {code}
> case class TestRecord(col0: Array[Byte])
> def convertToBytes(i: Long): Array[Byte] = {
> val bb = java.nio.ByteBuffer.allocate(8)
> bb.putLong(i)
> bb.array
>   }
> def test = {
> val sql = spark.sqlContext
> import sql.implicits._
> val timestamp = 1498772083037L
> val data = (timestamp to timestamp + 1000L).map(i => 
> TestRecord(convertToBytes(i)))
> val testDF = sc.parallelize(data).toDF
> val filter1 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 50L))
> val filter2 = testDF.filter(col("col0") >= convertToBytes(timestamp + 
> 50L) && col("col0") < convertToBytes(timestamp + 100L))
> val filter3 = testDF.filter(col("col0") >= convertToBytes(timestamp) && 
> col("col0") < convertToBytes(timestamp + 100L))
> assert(filter1.count == 50)
> assert(filter2.count == 50)
> assert(filter3.count == 100)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21337) SQL which has large ‘case when’ expressions may cause code generation beyond 64KB

2017-07-07 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16077952#comment-16077952
 ] 

Kazuaki Ishizaki commented on SPARK-21337:
--

In the master branch, I cannot see a huge dump and did not get a failure.
Should we have to backport a fix into 2.1.1 if I am correct?

{code}
  test("split complex single column expressions") {
val cases = 50
val conditionClauses = 20

// Generate an individual case
def generateCase(n: Int): (Expression, Expression) = {
  val condition = (1 to conditionClauses)
  .map(c => EqualTo(BoundReference(0, StringType, false), 
Literal(s"$c:$n")))
  .reduceLeft[Expression]((l, r) => Or(l, r))
  (condition, Literal(n))
}

val expression = CaseWhen((1 to cases).map(generateCase(_)))

// Currently this throws a java.util.concurrent.ExecutionException wrapping 
a
// org.codehaus.janino.JaninoRuntimeException: Code of method XXX of class 
YYY grows beyond 64 KB
val plan = GenerateMutableProjection.generate(Seq(expression))
  }
{code}

> SQL which has large ‘case when’ expressions may cause code generation beyond 
> 64KB
> -
>
> Key: SPARK-21337
> URL: https://issues.apache.org/jira/browse/SPARK-21337
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1
> Environment: spark-2.1.1-hadoop-2.6.0-cdh-5.4.2
>Reporter: fengchaoge
> Fix For: 2.1.1
>
>
> when there are large 'case when ' expressions in spark sql,the CodeGenerator 
> failed to compile it. 
> Error message is followed by a huge dump of generated source code,at last 
> failed.
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> apply_9$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$SpecificUnsafeProjection;Lorg/apache/spark/sql/catalyst/InternalRow;)V
>  of class 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection
>  grows beyond 64 KB.
> It seems like SPARK-13242 has solved this problem in spark-1.6.1,however it  
> apparence in spark-2.1.1 again. 
> https://issues.apache.org/jira/browse/SPARK-13242.
> is there something wrong ? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: [VOTE] Apache Spark 2.2.0 (RC6)

2017-07-01 Thread Kazuaki Ishizaki

+1 (non-binding)

I tested it on Ubuntu 16.04 and OpenJDK8 on ppc64le. All of the tests for 
core/sql-core/sql-catalyst/mllib/mllib-local have passed.

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

% build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 -T 
24 clean package install
% build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core 
-pl 'sql/core' -pl 'sql/catalyst' -pl mllib -pl mllib-local
...
Run completed in 15 minutes, 3 seconds.
Total number of tests run: 1113
Suites: completed 170, aborted 0
Tests: succeeded 1113, failed 0, canceled 0, ignored 6, pending 0
All tests passed.
[INFO] 

[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Core . SUCCESS [17:24 
min]
[INFO] Spark Project ML Local Library . SUCCESS [ 
7.161 s]
[INFO] Spark Project Catalyst . SUCCESS [11:55 
min]
[INFO] Spark Project SQL .. SUCCESS [18:38 
min]
[INFO] Spark Project ML Library ... SUCCESS [18:17 
min]
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 01:06 h
[INFO] Finished at: 2017-07-01T15:20:04+09:00
[INFO] Final Memory: 56M/591M
[INFO] 

[WARNING] The requested profile "hive" could not be activated because it 
does not exist.

Kazuaki Ishizaki




From:   Michael Armbrust <mich...@databricks.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/07/01 10:45
Subject:[VOTE] Apache Spark 2.2.0 (RC6)



Please vote on releasing the following candidate as Apache Spark version 
2.2.0. The vote is open until Friday, July 7th, 2017 at 18:00 PST and 
passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.2.0
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.2.0-rc6 (
a2c7b2133cfee7fa9abfaa2bfbfb637155466783)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
https://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc6-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1245/

The documentation corresponding to this release can be found at:
https://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc6-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.2.0?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.3.0 or 2.2.1.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.1.

[jira] [Comment Edited] (SPARK-21271) UnsafeRow.hashCode assertion when sizeInBytes not multiple of 8

2017-06-30 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070973#comment-16070973
 ] 

Kazuaki Ishizaki edited comment on SPARK-21271 at 7/1/17 3:18 AM:
--

I see. I will work for this.
Thank you for letting us how to fix it. I has been thinking about "+ 4". I also 
saw the similar code in 
[here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java#L68]


was (Author: kiszk):
I see. I will work for this.
Thank you for letting us how to fix it. I has been thinking about {{ + 4 }}. I 
also saw the similar code in 
[here|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/VariableLengthRowBasedKeyValueBatch.java#L68]

> UnsafeRow.hashCode assertion when sizeInBytes not multiple of 8
> ---
>
> Key: SPARK-21271
> URL: https://issues.apache.org/jira/browse/SPARK-21271
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bogdan Raducanu
>
> The method is:
> {code}
> public int hashCode() {
> return Murmur3_x86_32.hashUnsafeWords(baseObject, baseOffset, 
> sizeInBytes, 42);
>   }
> {code}
> but sizeInBytes is not always a multiple of 8 (in which case hashUnsafeWords 
> throws assertion) - for example here: 
> {code}FixedLengthRowBasedKeyValueBatch.appendRow{code}
> The fix could be to use hashUnsafeBytes or to use hashUnsafeWords but on a 
> prefix that is multiple of 8.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 3 4 5 6 7 8 9 10 11 12 >

701 - 800 of 1136 matches

Mail list logo