[jira] [Commented] (SPARK-20184) performance regression for complex/long sql when enable whole stage codegen

2017-04-05 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956420#comment-15956420 ] Kazuaki Ishizaki commented on SPARK-20184: -- If the number of rows are smaller, how about

Re: how do i force unit test to do whole stage codegen

2017-04-04 Thread Kazuaki Ishizaki
opic for d...@spark.apache.org. Kazuaki Ishizaki From: Koert Kuipers <ko...@tresata.com> To: "user@spark.apache.org" <user@spark.apache.org> Date: 2017/04/05 05:12 Subject:how do i force unit test to do whole stage codegen i wrote my own expression with eval an

[jira] [Commented] (SPARK-20176) Spark Dataframe UDAF issue

2017-04-03 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954093#comment-15954093 ] Kazuaki Ishizaki commented on SPARK-20176: -- Thanks. The code seem to work for the master. I am

[jira] [Comment Edited] (SPARK-20176) Spark Dataframe UDAF issue

2017-04-03 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954093#comment-15954093 ] Kazuaki Ishizaki edited comment on SPARK-20176 at 4/3/17 8:13 PM

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-02 Thread Kazuaki Ishizaki
Thank you. Yes, it is not a regression. 2.1.0 would have this failure, too. Regards, Kazuaki Ishizaki From: Sean Owen <so...@cloudera.com> To: Kazuaki Ishizaki/Japan/IBM@IBMJP, Michael Armbrust <mich...@databricks.com> Cc: "dev@spark.apache.org" <dev@spark

Re: [VOTE] Apache Spark 2.1.1 (RC2)

2017-04-02 Thread Kazuaki Ishizaki
0.002 sec <<< ERROR! java.lang.IllegalArgumentException: requirement failed: No support for unaligned Unsafe. Set spark.memory.offHeap.enabled to false. ... Tests run: 207, Failures: 7, Errors: 16, Skipped: 0 Kazuaki Ishizaki From: Michael Armbrust <mich...@databricks.com

[jira] [Commented] (SPARK-20176) Spark Dataframe UDAF issue

2017-03-31 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951473#comment-15951473 ] Kazuaki Ishizaki commented on SPARK-20176: -- Could you please post the program that can reproduce

[jira] [Comment Edited] (SPARK-20158) crash in Spark sql insert in partitioned hive tables

2017-03-30 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949364#comment-15949364 ] Kazuaki Ishizaki edited comment on SPARK-20158 at 3/30/17 4:35 PM

[jira] [Commented] (SPARK-20158) crash in Spark sql insert in partitioned hive tables

2017-03-30 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949364#comment-15949364 ] Kazuaki Ishizaki commented on SPARK-20158: -- What program did you run? > crash in Spark

[jira] [Comment Edited] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-03-30 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949359#comment-15949359 ] Kazuaki Ishizaki edited comment on SPARK-19984 at 3/30/17 4:33 PM

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-03-30 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949359#comment-15949359 ] Kazuaki Ishizaki commented on SPARK-19984: -- While I tried a reproducable program, I have

[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter

2017-03-28 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1594#comment-1594 ] Kazuaki Ishizaki commented on SPARK-20112: -- [~MasterDDT] Thank you for preparing additional

[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter

2017-03-28 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945389#comment-15945389 ] Kazuaki Ishizaki commented on SPARK-20112: -- SPARK-18745 fixed integer overflow issues

[jira] [Created] (SPARK-20101) Use OffHeapColumnVector when "spark.memory.offHeap.enabled" is set to "true"

2017-03-26 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-20101: Summary: Use OffHeapColumnVector when "spark.memory.offHeap.enabled" is set to "true" Key: SPARK-20101 URL: https://issues.apache.org/jir

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-03-23 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15939511#comment-15939511 ] Kazuaki Ishizaki commented on SPARK-19372: -- I implemented the code to take care of it, and am

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-23 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937907#comment-15937907 ] Kazuaki Ishizaki commented on SPARK-14083: -- I agree with you. I do not think that current status

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-23 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937886#comment-15937886 ] Kazuaki Ishizaki commented on SPARK-14083: -- [~viirya] For a while, I will be able to update

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-23 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937883#comment-15937883 ] Kazuaki Ishizaki commented on SPARK-14083: -- [~maropu] Thanks. > Analyze JVM bytecode and t

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-23 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15937783#comment-15937783 ] Kazuaki Ishizaki commented on SPARK-14083: -- [~viirya] Thank you for your comment. Good to hear

[jira] [Updated] (SPARK-20046) Facilitate loop optimizations in a JIT compiler regarding sqlContext.read.parquet()

2017-03-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-20046: - Issue Type: Improvement (was: Bug) > Facilitate loop optimizations in a JIT compi

[jira] [Created] (SPARK-20046) Facilitate loop optimizations in a JIT compiler regarding sqlContext.read.parquet()

2017-03-21 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-20046: Summary: Facilitate loop optimizations in a JIT compiler regarding sqlContext.read.parquet() Key: SPARK-20046 URL: https://issues.apache.org/jira/browse/SPARK-20046

Re: Why are DataFrames always read with nullable=True?

2017-03-20 Thread Kazuaki Ishizaki
if null exists in non-null column. Any comments are appreciated. Kazuaki Ishizaki From: Jason White <jason.wh...@shopify.com> To: dev@spark.apache.org Date: 2017/03/21 06:31 Subject:Why are DataFrames always read with nullable=True? If I create a dataframe in Spark with non-nu

Re: [Spark SQL & Core]: RDD to Dataset 1500 columns data with createDataFrame() throw exception of grows beyond 64 KB

2017-03-18 Thread Kazuaki Ishizaki
) at org.codehaus.janino.util.ClassFile.addConstantNameAndTypeInfo(ClassFile.java:439) at org.codehaus.janino.util.ClassFile.addConstantMethodrefInfo(ClassFile.java:358) ... While this PR https://github.com/apache/spark/pull/16648 addresses the number of the constant pool issue, it has not been merged yet. Regards, Kazuaki Ishizaki

[jira] [Comment Edited] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-03-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929399#comment-15929399 ] Kazuaki Ishizaki edited comment on SPARK-19984 at 3/17/17 4:23 AM

[jira] [Commented] (SPARK-19984) ERROR codegen.CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java'

2017-03-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929399#comment-15929399 ] Kazuaki Ishizaki commented on SPARK-19984: -- This problem occurs since Spark generates {.compare

[jira] [Created] (SPARK-19959) df[java.lang.Long].collect throws NullPointerException if df includes null

2017-03-15 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-19959: Summary: df[java.lang.Long].collect throws NullPointerException if df includes null Key: SPARK-19959 URL: https://issues.apache.org/jira/browse/SPARK-19959

[jira] [Commented] (SPARK-19950) nullable ignored when df.load() is executed for file-based data source

2017-03-14 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15925357#comment-15925357 ] Kazuaki Ishizaki commented on SPARK-19950: -- [~hyukjin.kwon] Thank you for pointing out SPARK

[jira] [Created] (SPARK-19950) nullable ignored when df.load() is executed for file-based data source

2017-03-14 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-19950: Summary: nullable ignored when df.load() is executed for file-based data source Key: SPARK-19950 URL: https://issues.apache.org/jira/browse/SPARK-19950

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-09 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15904460#comment-15904460 ] Kazuaki Ishizaki commented on SPARK-14083: -- I rebased this with master: https://github.com

[jira] [Commented] (SPARK-19875) Map->filter on many columns gets stuck in constraint inference optimization code

2017-03-09 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903584#comment-15903584 ] Kazuaki Ishizaki commented on SPARK-19875: -- I got the following stack trace. This stuck seems

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2017-03-05 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896452#comment-15896452 ] Kazuaki Ishizaki commented on SPARK-14083: -- Does anyone go forward with this? If not, I

[jira] [Commented] (SPARK-19503) Execution Plan Optimizer: avoid sort or shuffle when it does not change end result such as df.sort(...).count()

2017-03-04 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15895740#comment-15895740 ] Kazuaki Ishizaki commented on SPARK-19503: -- Is it better to control whether we prune local

[jira] [Commented] (SPARK-19503) Execution Plan Optimizer: avoid sort or shuffle when it does not change end result such as df.sort(...).count()

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891699#comment-15891699 ] Kazuaki Ishizaki commented on SPARK-19503: -- If it is good to leave sort intact for now, do we

[jira] [Commented] (SPARK-19468) Dataset slow because of unnecessary shuffles

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891658#comment-15891658 ] Kazuaki Ishizaki commented on SPARK-19468: -- Interesting. For {{val joined1 = ds1.joinWith(ds2

[jira] [Comment Edited] (SPARK-19741) ClassCastException when using Dataset with type containing value types

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891548#comment-15891548 ] Kazuaki Ishizaki edited comment on SPARK-19741 at 3/2/17 3:17 AM: -- I am

[jira] [Commented] (SPARK-19741) ClassCastException when using Dataset with type containing value types

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891548#comment-15891548 ] Kazuaki Ishizaki commented on SPARK-19741: -- I am afraid whether my sample program succeeded

[jira] [Commented] (SPARK-19741) ClassCastException when using Dataset with type containing value types

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890712#comment-15890712 ] Kazuaki Ishizaki commented on SPARK-19741: -- The following program causes an exception regarding

[jira] [Updated] (SPARK-19786) Facilitate loop optimizations in a JIT compiler regarding range()

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-19786: - Summary: Facilitate loop optimizations in a JIT compiler regarding range

[jira] [Created] (SPARK-19786) Facilitate loop optimization in a JIT compiler regarding range()

2017-03-01 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-19786: Summary: Facilitate loop optimization in a JIT compiler regarding range() Key: SPARK-19786 URL: https://issues.apache.org/jira/browse/SPARK-19786 Project

[jira] [Commented] (SPARK-19741) ClassCastException when using Dataset with type containing value types

2017-02-26 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15885171#comment-15885171 ] Kazuaki Ishizaki commented on SPARK-19741: -- Would it be possible to attache a pair of the error

[jira] [Commented] (SPARK-15678) Not use cache on appends and overwrites

2017-02-24 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15883035#comment-15883035 ] Kazuaki Ishizaki commented on SPARK-15678: -- Sorry for being late to reply. According

[jira] [Commented] (SPARK-15678) Not use cache on appends and overwrites

2017-02-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877575#comment-15877575 ] Kazuaki Ishizaki commented on SPARK-15678: -- How about insert {{spark.catalog.refreshByPath

[jira] [Comment Edited] (SPARK-15678) Not use cache on appends and overwrites

2017-02-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15877575#comment-15877575 ] Kazuaki Ishizaki edited comment on SPARK-15678 at 2/22/17 6:37 AM: --- How

Re: A DataFrame cache bug

2017-02-21 Thread Kazuaki Ishizaki
t;id>10") return correct result. f(df1).count // output 89 which is incorrect Regards, Kazuaki Ishizaki From: gen tang <gen.tan...@gmail.com> To: dev@spark.apache.org Date: 2017/02/22 15:02 Subject:Re: A DataFrame cache bug Hi All, I might find a related issue

[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

2017-02-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872901#comment-15872901 ] Kazuaki Ishizaki commented on SPARK-19653: -- cc: [~cloud_fan] > `Vector` Type Should Be A Fi

Re: welcoming Takuya Ueshin as a new Apache Spark committer

2017-02-13 Thread Kazuaki Ishizaki
Congrats! Kazuaki Ishizaki From: Reynold Xin <r...@databricks.com> To: "dev@spark.apache.org" <dev@spark.apache.org> Date: 2017/02/14 04:18 Subject:welcoming Takuya Ueshin as a new Apache Spark committer Hi all, Takuya-san has recently been elected an

[jira] [Resolved] (SPARK-16043) Prepare GenericArrayData implementation specialized for a primitive array

2017-02-03 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki resolved SPARK-16043. -- Resolution: Fixed Fix Version/s: 2.2.0 > Prepare GenericArrayD

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-02-03 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851472#comment-15851472 ] Kazuaki Ishizaki commented on SPARK-19372: -- I was able to reproduce this. I am thinking how

Re: Spark performance tests

2017-01-10 Thread Kazuaki Ishizaki
Hi, You may find several micro-benchmarks under https://github.com/apache/spark/tree/master/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark . Regards, Kazuaki Ishizaki From: Prasun Ratn <prasun.r...@gmail.com> To: Apache Spark Dev <dev@spark.apache.org> Dat

[jira] [Commented] (SPARK-19008) Avoid boxing/unboxing overhead of calling a lambda with primitive type from Dataset program

2016-12-26 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779544#comment-15779544 ] Kazuaki Ishizaki commented on SPARK-19008: -- I will work for this > Avoid boxing/unbox

[jira] [Updated] (SPARK-19008) Avoid boxing/unboxing overhead of calling a lambda with primitive type from Dataset program

2016-12-26 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-19008: - Description: In a [discussion|https://github.com/apache/spark/pull/16391

[jira] [Created] (SPARK-19008) Avoid boxing/unboxing overhead of calling a lambda with primitive type from Dataset program

2016-12-26 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-19008: Summary: Avoid boxing/unboxing overhead of calling a lambda with primitive type from Dataset program Key: SPARK-19008 URL: https://issues.apache.org/jira/browse/SPARK

[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2016-12-26 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779288#comment-15779288 ] Kazuaki Ishizaki commented on SPARK-14083: -- [Here|https://github.com/apache/spark/pull/16391

Sharing data in columnar storage between two applications

2016-12-25 Thread Kazuaki Ishizaki
is that both applications supports Apache Arrow APIs. Other approaches could be. What approach would be good for all of applications? Regards, Kazuaki Ishizaki

[jira] [Commented] (SPARK-18859) Catalyst codegen does not mark column as nullable when it should. Causes NPE

2016-12-20 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764789#comment-15764789 ] Kazuaki Ishizaki commented on SPARK-18859: -- I think that this is an issue in join operation

[jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32

2016-12-12 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15742828#comment-15742828 ] Kazuaki Ishizaki commented on SPARK-18814: -- I found the same error

[jira] [Commented] (SPARK-16073) Performance of Parquet encodings on saving primitive arrays

2016-12-11 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740807#comment-15740807 ] Kazuaki Ishizaki commented on SPARK-16073: -- It is an interesting topic. In the current situation

[jira] [Comment Edited] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-09 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735580#comment-15735580 ] Kazuaki Ishizaki edited comment on SPARK-18745 at 12/9/16 3:29 PM: --- I

[jira] [Comment Edited] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-09 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735580#comment-15735580 ] Kazuaki Ishizaki edited comment on SPARK-18745 at 12/9/16 3:21 PM: --- I

[jira] [Commented] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-09 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735580#comment-15735580 ] Kazuaki Ishizaki commented on SPARK-18745: -- I identified a root cause

Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-08 Thread Kazuaki Ishizaki
The line where I pointed out would work correctly. This is because a type of this division is double. d2i correctly handles overflow cases. Kazuaki Ishizaki From: Nicholas Chammas <nicholas.cham...@gmail.com> To: Kazuaki Ishizaki/Japan/IBM@IBMJP, Reynold Xin <r...@databrick

Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-07 Thread Kazuaki Ishizaki
/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L156 Regards, Kazuaki Ishizaki From: Reynold Xin <r...@databricks.com> To: Nicholas Chammas <nicholas.cham...@gmail.com> Cc: Spark dev list <dev@spark.apache.org> Date: 2016/12/07 14:27 Subject:Re: R

[jira] [Updated] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-06 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-18745: - Affects Version/s: 2.2.0 > java.lang.IndexOutOfBoundsException running query 68 Sp

[jira] [Commented] (SPARK-18745) java.lang.IndexOutOfBoundsException running query 68 Spark SQL on (100TB)

2016-12-06 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15726486#comment-15726486 ] Kazuaki Ishizaki commented on SPARK-18745: -- I work with [~jfc...@us.ibm.com

[jira] [Created] (SPARK-18653) Dataset.show() generates incorrect padding for Unicode Character

2016-11-30 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-18653: Summary: Dataset.show() generates incorrect padding for Unicode Character Key: SPARK-18653 URL: https://issues.apache.org/jira/browse/SPARK-18653 Project

[jira] [Commented] (SPARK-17680) Unicode Character Support for Column Names and Comments

2016-11-29 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15707629#comment-15707629 ] Kazuaki Ishizaki commented on SPARK-17680: -- Sorry, it is my mistake. > Unicode Charac

[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)

2016-11-29 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15706385#comment-15706385 ] Kazuaki Ishizaki commented on SPARK-18502: -- I can reproduce this exception using the following

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-28 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15702760#comment-15702760 ] Kazuaki Ishizaki commented on SPARK-18492: -- I realized that the following code can reproduce

[jira] [Resolved] (SPARK-15950) Eliminate unreachable code at projection for complex types

2016-11-27 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki resolved SPARK-15950. -- Resolution: Duplicate > Eliminate unreachable code at projection for complex ty

Re: Linear regression + Janino Exception

2016-11-21 Thread Kazuaki Ishizaki
Thank you for reporting the error. I think that this is associated to https://issues.apache.org/jira/browse/SPARK-18492 The reporter of this JIRA entry has not posted the program yet. Would it be possible to add your program that can reproduce this issue to this JIRA entry? Regards, Kazuaki

[jira] [Comment Edited] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674599#comment-15674599 ] Kazuaki Ishizaki edited comment on SPARK-18492 at 11/17/16 7:31 PM: I

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-11-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674599#comment-15674599 ] Kazuaki Ishizaki commented on SPARK-18492: -- Can you post a small program that can reproduce

[jira] [Commented] (SPARK-18458) core dumped running Spark SQL on large data volume (100TB)

2016-11-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671237#comment-15671237 ] Kazuaki Ishizaki commented on SPARK-18458: -- I worked with [~jfc...@us.ibm.com]. Then, I

[jira] [Issue Comment Deleted] (SPARK-18458) core dumped running Spark SQL on large data volume (100TB)

2016-11-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-18458: - Comment: was deleted (was: I worked with [~jfc...@us.ibm.com]. Then, I identified

[jira] [Commented] (SPARK-18458) core dumped running Spark SQL on large data volume (100TB)

2016-11-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671229#comment-15671229 ] Kazuaki Ishizaki commented on SPARK-18458: -- I worked with [~jfc...@us.ibm.com]. Then, I

[jira] [Commented] (SPARK-18458) core dumped running Spark SQL on large data volume (100TB)

2016-11-16 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670828#comment-15670828 ] Kazuaki Ishizaki commented on SPARK-18458: -- I see. I will do that. > core dumped running Sp

[jira] [Commented] (SPARK-18458) core dumped running Spark SQL on large data volume (100TB)

2016-11-15 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15668948#comment-15668948 ] Kazuaki Ishizaki commented on SPARK-18458: -- I work for this. > core dumped running Spark

[jira] [Updated] (SPARK-18284) Scheme of DataFrame generated from RDD is diffrent between master and 2.0

2016-11-05 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-18284: - Description: When the following program is executed, a schema of dataframe is different

[jira] [Updated] (SPARK-18284) Scheme of DataFrame generated from RDD is diffrent between master and 2.0

2016-11-05 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-18284: - Affects Version/s: 2.1.0 > Scheme of DataFrame generated from RDD is diffrent betw

[jira] [Created] (SPARK-18284) Scheme of DataFrame generated from RDD is diffrent between master and 2.0

2016-11-04 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-18284: Summary: Scheme of DataFrame generated from RDD is diffrent between master and 2.0 Key: SPARK-18284 URL: https://issues.apache.org/jira/browse/SPARK-18284

[jira] [Commented] (SPARK-18207) class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB

2016-11-02 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629522#comment-15629522 ] Kazuaki Ishizaki commented on SPARK-18207: -- I created a smaller program to reproduce

[jira] [Commented] (SPARK-18125) Spark generated code causes CompileException when groupByKey, reduceGroups and map(_._2) are used

2016-10-28 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615034#comment-15615034 ] Kazuaki Ishizaki commented on SPARK-18125: -- I confirmed this code can reproduce on 2.0.1

[jira] [Commented] (SPARK-18147) Broken Spark SQL Codegen

2016-10-28 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614959#comment-15614959 ] Kazuaki Ishizaki commented on SPARK-18147: -- This also cause the same exception. {code:java

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-27 Thread Kazuaki Ishizaki
Hi Chin Wei, Thank you for confirming this on 2.0.1 and being happy to hear it never happens. The performance will be improved when this PR ( https://github.com/apache/spark/pull/15219) is integrated. Regards, Kazuaki Ishizaki From: Chin Wei Low <lowchin...@gmail.com> To: K

Re: [Spark 2.0.1] Error in generated code, possible regression?

2016-10-25 Thread Kazuaki Ishizaki
Can you have a smaller program that can reproduce the same error? If you also create a JIRA entry, it would be great. Kazuaki Ishizaki From: Efe Selcuk <efema...@gmail.com> To: "user @spark" <user@spark.apache.org> Date: 2016/10/25 10:23 Subject:

Re: Spark SQL is slower when DataFrame is cache in Memory

2016-10-24 Thread Kazuaki Ishizaki
Hi Chin Wei, I am sorry for being late to reply. Got it. Interesting behavior. How did you measure the time between 1st and 2nd events? Best Regards, Kazuaki Ishizaki From: Chin Wei Low <lowchin...@gmail.com> To: Kazuaki Ishizaki/Japan/IBM@IBMJP Cc: user@spark.apache.or

[jira] [Commented] (SPARK-15687) Columnar execution engine

2016-10-19 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15588641#comment-15588641 ] Kazuaki Ishizaki commented on SPARK-15687: -- [#15219|https://github.com/apache/spark/pull/15219

[jira] [Created] (SPARK-17915) Prepare ColumnVector implementation for UnsafeData

2016-10-13 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-17915: Summary: Prepare ColumnVector implementation for UnsafeData Key: SPARK-17915 URL: https://issues.apache.org/jira/browse/SPARK-17915 Project: Spark

[jira] [Created] (SPARK-17912) Refactor code generation to get data for ColumnVector/ColumnarBatch

2016-10-13 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-17912: Summary: Refactor code generation to get data for ColumnVector/ColumnarBatch Key: SPARK-17912 URL: https://issues.apache.org/jira/browse/SPARK-17912 Project

[jira] [Created] (SPARK-17905) Added test cases for InMemoryRelation

2016-10-13 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-17905: Summary: Added test cases for InMemoryRelation Key: SPARK-17905 URL: https://issues.apache.org/jira/browse/SPARK-17905 Project: Spark Issue Type

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-10-12 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569493#comment-15569493 ] Kazuaki Ishizaki commented on SPARK-16845: -- Thank you for preparing the case. I noticed

[jira] [Issue Comment Deleted] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-10-12 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-16845: - Comment: was deleted (was: Thank you for preparing the case. I noticed

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-10-12 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15569489#comment-15569489 ] Kazuaki Ishizaki commented on SPARK-16845: -- Thank you for preparing the case. I noticed

[jira] [Resolved] (SPARK-16223) Codegen failure with a Dataframe program using an array

2016-10-08 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki resolved SPARK-16223. -- Resolution: Fixed > Codegen failure with a Dataframe program using an ar

[jira] [Commented] (SPARK-16223) Codegen failure with a Dataframe program using an array

2016-10-08 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15557853#comment-15557853 ] Kazuaki Ishizaki commented on SPARK-16223: -- When I rerun it with {{commit

[jira] [Created] (SPARK-17490) Optimize SerializeFromObject for primitive array

2016-09-10 Thread Kazuaki Ishizaki (JIRA)
Kazuaki Ishizaki created SPARK-17490: Summary: Optimize SerializeFromObject for primitive array Key: SPARK-17490 URL: https://issues.apache.org/jira/browse/SPARK-17490 Project: Spark

[jira] [Updated] (SPARK-16213) Reduce runtime overhead of a program that creates an primitive array in DataFrame

2016-09-05 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Ishizaki updated SPARK-16213: - Affects Version/s: 2.0.0 > Reduce runtime overhead of a program that crea

Re: Cache'ing performance

2016-08-27 Thread Kazuaki Ishizaki
pull/11956. This PR shows room to performance improvement for float/double values that are not compressed. Kazuaki Ishizaki From: linguin@gmail.com To: Maciej Bry��ski <mac...@brynski.pl> Cc: Spark dev list <dev@spark.apache.org> Date: 2016/08/28 11:30 Subject

Re: Cache'ing performance

2016-08-27 Thread Kazuaki Ishizaki
these pull requests. Best Regards, Kazuaki Ishizaki From: Maciej Bryński <mac...@brynski.pl> To: Spark dev list <dev@spark.apache.org> Date: 2016/08/28 05:40 Subject:Cache'ing performance Hi, I did some benchmark of cache function today. RDD sc.parallelize(0 unt

Re: Change nullable property in Dataset schema

2016-08-17 Thread Kazuaki Ishizaki
ork for my purpose. It actually does nothing. Kazuaki Ishizaki From: Jacek Laskowski <ja...@japila.pl> To: Kazuaki Ishizaki/Japan/IBM@IBMJP Cc: user <user@spark.apache.org> Date: 2016/08/15 04:56 Subject:Re: Change nullable property in Dataset schema On Wed,

<    5   6   7   8   9   10   11   12   >