[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-08-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578763#comment-16578763 ] Liang-Chi Hsieh commented on SPARK-24410: - The above code shows that the two tables in union

[jira] [Commented] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578709#comment-16578709 ] Liang-Chi Hsieh commented on SPARK-22347: - Agreed. Thanks [~rdblue] > UDF is evaluated when

[jira] [Commented] (SPARK-25080) NPE in HiveShim$.toCatalystDecimal(HiveShim.scala:110)

2018-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577379#comment-16577379 ] Liang-Chi Hsieh commented on SPARK-25080: - Did a quick test but can't reproduce that. Is it

[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577065#comment-16577065 ] Liang-Chi Hsieh edited comment on SPARK-24152 at 8/11/18 9:23 PM: -- CRAN

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16577065#comment-16577065 ] Liang-Chi Hsieh commented on SPARK-24152: - CRAN sysadmin replied they fixed it now. Looks good

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576862#comment-16576862 ] Liang-Chi Hsieh commented on SPARK-24152: - I found retriggered test still failed. I found out

[jira] [Comment Edited] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576559#comment-16576559 ] Liang-Chi Hsieh edited comment on SPARK-24152 at 8/10/18 4:53 PM: -- I

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576559#comment-16576559 ] Liang-Chi Hsieh commented on SPARK-24152: - I checked locally. Seems fine, I don't see the error

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-08-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16576491#comment-16576491 ] Liang-Chi Hsieh commented on SPARK-24152: - Sorry just see this. I will ask CRAN sysadmin again.

[jira] [Created] (SPARK-25010) Rand/Randn should produce different values for each execution in streaming query

2018-08-02 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-25010: --- Summary: Rand/Randn should produce different values for each execution in streaming query Key: SPARK-25010 URL: https://issues.apache.org/jira/browse/SPARK-25010

[jira] [Commented] (SPARK-24906) Enlarge split size for columnar file to ensure the task read enough data

2018-07-24 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16554716#comment-16554716 ] Liang-Chi Hsieh commented on SPARK-24906: - A {{maxPartitionBytes}} value adapted improperly

[jira] [Updated] (SPARK-24896) Uuid expression should produce different values in each execution under streaming query

2018-07-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24896: Component/s: Structured Streaming > Uuid expression should produce different values in

[jira] [Created] (SPARK-24896) Uuid expression should produce different values in each execution under streaming query

2018-07-23 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24896: --- Summary: Uuid expression should produce different values in each execution under streaming query Key: SPARK-24896 URL: https://issues.apache.org/jira/browse/SPARK-24896

[jira] [Resolved] (SPARK-24885) Initialize random seeds for Rand and Randn expression during analysis

2018-07-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh resolved SPARK-24885. - Resolution: Won't Fix > Initialize random seeds for Rand and Randn expression during

[jira] [Updated] (SPARK-24885) Initialize random seeds for Rand and Randn expression during analysis

2018-07-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24885: Description: Random expressions such as Rand and Randn should have the same behavior as

[jira] [Updated] (SPARK-24885) Initialize random seeds for Rand and Randn expression during analysis

2018-07-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24885: Summary: Initialize random seeds for Rand and Randn expression during analysis (was:

[jira] [Updated] (SPARK-24885) Rand and Randn expression should produce same result at DataFrame on retries

2018-07-22 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24885: Description: Random expressions such as Rand and Randn should have the same behavior as

[jira] [Created] (SPARK-24885) Rand and Randn expression should produce same result at DataFrame on retries

2018-07-22 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24885: --- Summary: Rand and Randn expression should produce same result at DataFrame on retries Key: SPARK-24885 URL: https://issues.apache.org/jira/browse/SPARK-24885

[jira] [Comment Edited] (SPARK-24875) MulticlassMetrics should offer a more efficient way to compute count by label

2018-07-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551457#comment-16551457 ] Liang-Chi Hsieh edited comment on SPARK-24875 at 7/21/18 12:21 AM: ---

[jira] [Commented] (SPARK-24875) MulticlassMetrics should offer a more efficient way to compute count by label

2018-07-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551457#comment-16551457 ] Liang-Chi Hsieh commented on SPARK-24875: - hmm, I think for calculation of precision, recall and

[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551416#comment-16551416 ] Liang-Chi Hsieh commented on SPARK-24862: - Isn't it inconsistent between the schema and the

[jira] [Commented] (SPARK-24862) Spark Encoder is not consistent to scala case class semantic for multiple argument lists

2018-07-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551174#comment-16551174 ] Liang-Chi Hsieh commented on SPARK-24862: - Even we only retrieve the first parameter list at

[jira] [Commented] (SPARK-24847) ScalaReflection#schemaFor occasionally fails to detect schema for Seq of type alias

2018-07-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550091#comment-16550091 ] Liang-Chi Hsieh commented on SPARK-24847: - I can't reproduce this currently. >

[jira] [Commented] (SPARK-24835) col function ignores drop

2018-07-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547123#comment-16547123 ] Liang-Chi Hsieh commented on SPARK-24835: - `drop` actually does to add a projection on top of

[jira] [Comment Edited] (SPARK-24835) col function ignores drop

2018-07-17 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547123#comment-16547123 ] Liang-Chi Hsieh edited comment on SPARK-24835 at 7/17/18 9:25 PM: --

[jira] [Commented] (SPARK-24666) Word2Vec generate infinity vectors when numIterations are large

2018-07-08 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536538#comment-16536538 ] Liang-Chi Hsieh commented on SPARK-24666: - Is it possible you can provide an example dataset and

[jira] [Created] (SPARK-24762) Aggregator should be able to use Option of Product encoder

2018-07-08 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24762: --- Summary: Aggregator should be able to use Option of Product encoder Key: SPARK-24762 URL: https://issues.apache.org/jira/browse/SPARK-24762 Project: Spark

[jira] [Commented] (SPARK-24756) Incorrect Statistics

2018-07-07 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535976#comment-16535976 ] Liang-Chi Hsieh commented on SPARK-24756: - Because seems not yet a suitable approach to estimate

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-07-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535474#comment-16535474 ] Liang-Chi Hsieh commented on SPARK-24152: - I've noticed it too and already asked CRAN sysadmin

[jira] [Commented] (SPARK-24746) AWS S3 301 Moved Permanently error message even after setting fs.s3a.endpoint for bucket in Mumbai region.

2018-07-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534443#comment-16534443 ] Liang-Chi Hsieh commented on SPARK-24746: - Maybe related:

[jira] [Commented] (SPARK-24464) Unit tests for MLlib's Instrumentation

2018-07-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533461#comment-16533461 ] Liang-Chi Hsieh commented on SPARK-24464: - Because {{Instrumentation}} is used for logging, I

[jira] [Commented] (SPARK-24438) Empty strings and null strings are written to the same partition

2018-07-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532392#comment-16532392 ] Liang-Chi Hsieh commented on SPARK-24438: - >From the code, looks like we intentionally treat

[jira] [Commented] (SPARK-24467) VectorAssemblerEstimator

2018-07-04 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16532324#comment-16532324 ] Liang-Chi Hsieh commented on SPARK-24467: - It sounds good to me for the approach similar to one

[jira] [Commented] (SPARK-24528) Missing optimization for Aggregations/Windowing on a bucketed table

2018-07-01 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529361#comment-16529361 ] Liang-Chi Hsieh commented on SPARK-24528: - I think we can have a sql config to control

[jira] [Commented] (SPARK-24689) java.io.NotSerializableException: org.apache.spark.mllib.clustering.DistributedLDAModel

2018-06-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16528490#comment-16528490 ] Liang-Chi Hsieh commented on SPARK-24689: - I think we don't set priority as Blocker which is

[jira] [Commented] (SPARK-24667) If folders managed by DiskBlockManager are deleted manually, shell throws FileNotFoundException

2018-06-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16527347#comment-16527347 ] Liang-Chi Hsieh commented on SPARK-24667: - I think it is not a bug... > If folders managed by

[jira] [Created] (SPARK-24635) Remove Blocks class

2018-06-23 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24635: --- Summary: Remove Blocks class Key: SPARK-24635 URL: https://issues.apache.org/jira/browse/SPARK-24635 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2018-06-21 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519119#comment-16519119 ] Liang-Chi Hsieh commented on SPARK-24607: - [~gostop_zlx] If you don't give a seed to {{rand}},

[jira] [Comment Edited] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2018-06-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518206#comment-16518206 ] Liang-Chi Hsieh edited comment on SPARK-24607 at 6/20/18 2:40 PM: --

[jira] [Commented] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2018-06-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518206#comment-16518206 ] Liang-Chi Hsieh commented on SPARK-24607: - Thanks [~mgaido]! As I check {{Rand}} expression,

[jira] [Commented] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2018-06-20 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16518184#comment-16518184 ] Liang-Chi Hsieh commented on SPARK-24607: - >From the following test, looks it is ok.

[jira] [Comment Edited] (SPARK-24465) LSHModel should support Structured Streaming for transform

2018-06-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513649#comment-16513649 ] Liang-Chi Hsieh edited comment on SPARK-24465 at 6/15/18 10:32 AM: ---

[jira] [Commented] (SPARK-24465) LSHModel should support Structured Streaming for transform

2018-06-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513649#comment-16513649 ] Liang-Chi Hsieh commented on SPARK-24465: - I'm not sure if SPARK-12878 is a real issue. Seems

[jira] [Commented] (SPARK-12878) Dataframe fails with nested User Defined Types

2018-06-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16513647#comment-16513647 ] Liang-Chi Hsieh commented on SPARK-12878: - Is this a real issue? Seems to me that you can't

[jira] [Updated] (SPARK-24548) JavaPairRDD to Dataset in SPARK generates ambiguous results

2018-06-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24548: Component/s: (was: Spark Core) > JavaPairRDD to Dataset in SPARK generates ambiguous

[jira] [Updated] (SPARK-24548) JavaPairRDD to Dataset in SPARK generates ambiguous results

2018-06-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24548: Component/s: SQL > JavaPairRDD to Dataset in SPARK generates ambiguous results >

[jira] [Commented] (SPARK-24528) Missing optimization for Aggregations/Windowing on a bucketed table

2018-06-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511924#comment-16511924 ] Liang-Chi Hsieh commented on SPARK-24528: - Btw, I think the complete and reproducible examples

[jira] [Updated] (SPARK-24505) Convert strings in codegen to blocks: Cast and BoundAttribute

2018-06-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24505: Description: The CodeBlock interpolator now accepts strings. Based on previous

[jira] [Updated] (SPARK-24505) Convert strings in codegen to blocks: Cast and BoundAttribute

2018-06-12 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-24505: Summary: Convert strings in codegen to blocks: Cast and BoundAttribute (was: Forbidding

[jira] [Comment Edited] (SPARK-23596) Modify Dataset test harness to include interpreted execution

2018-06-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509137#comment-16509137 ] Liang-Chi Hsieh edited comment on SPARK-23596 at 6/12/18 3:55 AM: -- One

[jira] [Commented] (SPARK-23596) Modify Dataset test harness to include interpreted execution

2018-06-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509137#comment-16509137 ] Liang-Chi Hsieh commented on SPARK-23596: - One concern I have right now is that by testing the

[jira] [Commented] (SPARK-24517) Bug in loading unstructured data

2018-06-11 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509117#comment-16509117 ] Liang-Chi Hsieh commented on SPARK-24517: - Is it also happened when using built-in json

[jira] [Created] (SPARK-24505) Forbidding string interpolation in CodeBlock

2018-06-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24505: --- Summary: Forbidding string interpolation in CodeBlock Key: SPARK-24505 URL: https://issues.apache.org/jira/browse/SPARK-24505 Project: Spark Issue

[jira] [Commented] (SPARK-24504) Implement SparkSQL authorization plugin in Apache Ranger

2018-06-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506889#comment-16506889 ] Liang-Chi Hsieh commented on SPARK-24504: - Seems you created duplicate ticket, can you close one

[jira] [Comment Edited] (SPARK-24447) Pyspark RowMatrix.columnSimilarities() loses spark context

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504213#comment-16504213 ] Liang-Chi Hsieh edited comment on SPARK-24447 at 6/7/18 4:09 AM: - I just

[jira] [Commented] (SPARK-24447) Pyspark RowMatrix.columnSimilarities() loses spark context

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504213#comment-16504213 ] Liang-Chi Hsieh commented on SPARK-24447: - I just build Spark from current 2.3 branch. The above

[jira] [Commented] (SPARK-24447) Pyspark RowMatrix.columnSimilarities() loses spark context

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16504152#comment-16504152 ] Liang-Chi Hsieh commented on SPARK-24447: - Yes, I can run the example code on a build from

[jira] [Comment Edited] (SPARK-24357) createDataFrame in Python infers large integers as long type and then fails silently when converting them

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503065#comment-16503065 ] Liang-Chi Hsieh edited comment on SPARK-24357 at 6/6/18 9:50 AM: - I

[jira] [Commented] (SPARK-24357) createDataFrame in Python infers large integers as long type and then fails silently when converting them

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503065#comment-16503065 ] Liang-Chi Hsieh commented on SPARK-24357: - I think this is because this number {{1 << 65}}

[jira] [Commented] (SPARK-24467) VectorAssemblerEstimator

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503002#comment-16503002 ] Liang-Chi Hsieh commented on SPARK-24467: - [~josephkb] Does that mean {{VectorAssembler}} will

[jira] [Commented] (SPARK-24447) Pyspark RowMatrix.columnSimilarities() loses spark context

2018-06-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502876#comment-16502876 ] Liang-Chi Hsieh commented on SPARK-24447: - I can't reproduce this in current master branch. Can

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-31 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496178#comment-16496178 ] Liang-Chi Hsieh commented on SPARK-24410: - Yeah, it depends on how we combine the RDDs from

[jira] [Comment Edited] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496132#comment-16496132 ] Liang-Chi Hsieh edited comment on SPARK-24410 at 5/31/18 5:39 AM: -- We

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16496132#comment-16496132 ] Liang-Chi Hsieh commented on SPARK-24410: - We can verify the partition of union dataframe:

[jira] [Comment Edited] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494885#comment-16494885 ] Liang-Chi Hsieh edited comment on SPARK-24410 at 5/30/18 2:20 PM: -- I've

[jira] [Comment Edited] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494885#comment-16494885 ] Liang-Chi Hsieh edited comment on SPARK-24410 at 5/30/18 8:41 AM: -- I've

[jira] [Comment Edited] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494885#comment-16494885 ] Liang-Chi Hsieh edited comment on SPARK-24410 at 5/30/18 8:41 AM: -- I've

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-30 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494885#comment-16494885 ] Liang-Chi Hsieh commented on SPARK-24410: - I've done some experiments locally. But the results

[jira] [Commented] (SPARK-24409) exception when sending large list in filter(col(x).isin(list))

2018-05-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494647#comment-16494647 ] Liang-Chi Hsieh commented on SPARK-24409: - Seems you use AWS Glue Data Catalog as the Metastore

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-05-29 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16494391#comment-16494391 ] Liang-Chi Hsieh commented on SPARK-24410: - [~cloud_fan] Thanks for pinging me. I'll look into

[jira] [Created] (SPARK-24361) Polish code block manipulation API

2018-05-22 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24361: --- Summary: Polish code block manipulation API Key: SPARK-24361 URL: https://issues.apache.org/jira/browse/SPARK-24361 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23455) Default Params in ML should be saved separately

2018-05-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16476753#comment-16476753 ] Liang-Chi Hsieh commented on SPARK-23455: - According to [~josephkb]'s comment at

[jira] [Created] (SPARK-24259) ArrayWriter for Arrow produces wrong output

2018-05-12 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24259: --- Summary: ArrayWriter for Arrow produces wrong output Key: SPARK-24259 URL: https://issues.apache.org/jira/browse/SPARK-24259 Project: Spark Issue

[jira] [Created] (SPARK-24242) RangeExec should have correct outputOrdering

2018-05-10 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24242: --- Summary: RangeExec should have correct outputOrdering Key: SPARK-24242 URL: https://issues.apache.org/jira/browse/SPARK-24242 Project: Spark Issue

[jira] [Commented] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-05-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465010#comment-16465010 ] Liang-Chi Hsieh commented on SPARK-21274: - [~dkbiswal] No problem. Current EXCEPT ALL rewrite is

[jira] [Comment Edited] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-05-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465000#comment-16465000 ] Liang-Chi Hsieh edited comment on SPARK-21274 at 5/6/18 7:30 AM: - I read

[jira] [Comment Edited] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-05-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465000#comment-16465000 ] Liang-Chi Hsieh edited comment on SPARK-21274 at 5/6/18 7:29 AM: - I read

[jira] [Commented] (SPARK-21274) Implement EXCEPT ALL and INTERSECT ALL

2018-05-06 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465000#comment-16465000 ] Liang-Chi Hsieh commented on SPARK-21274: - I read the design doc. It looks correct to me. I found

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-05-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462280#comment-16462280 ] Liang-Chi Hsieh commented on SPARK-24152: - Can be resolved now as I saw Jenkins test passed. >

[jira] [Commented] (SPARK-24152) SparkR CRAN feasibility check server problem

2018-05-03 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462030#comment-16462030 ] Liang-Chi Hsieh commented on SPARK-24152: - I think it is fixed now. It works in local. But better

[jira] [Commented] (SPARK-24152) Flaky Test: SparkR

2018-05-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461962#comment-16461962 ] Liang-Chi Hsieh commented on SPARK-24152: - CRAN sysadmin replied me it should be fixed now. I

[jira] [Commented] (SPARK-24152) Flaky Test: SparkR

2018-05-02 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461867#comment-16461867 ] Liang-Chi Hsieh commented on SPARK-24152: - Thanks [~hyukjin.kwon] for pinging me. I found a

[jira] [Created] (SPARK-24131) Add majorMinorVersion API to PySpark for determining Spark versions

2018-04-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24131: --- Summary: Add majorMinorVersion API to PySpark for determining Spark versions Key: SPARK-24131 URL: https://issues.apache.org/jira/browse/SPARK-24131 Project:

[jira] [Created] (SPARK-24121) The API for handling expression code generation in expression codegen

2018-04-30 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24121: --- Summary: The API for handling expression code generation in expression codegen Key: SPARK-24121 URL: https://issues.apache.org/jira/browse/SPARK-24121 Project:

[jira] [Commented] (SPARK-24058) Default Params in ML should be saved separately: Python API

2018-04-23 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449313#comment-16449313 ] Liang-Chi Hsieh commented on SPARK-24058: - OK. I will work on this. Thanks. > Default Params in

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443959#comment-16443959 ] Liang-Chi Hsieh commented on SPARK-23711: - Ok. > Add fallback to interpreted execution logic >

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443924#comment-16443924 ] Liang-Chi Hsieh commented on SPARK-23711: - Yeah, I agree that is a good rule. I will prepare a PR

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443879#comment-16443879 ] Liang-Chi Hsieh commented on SPARK-23711: - object hash aggregate? If you mean

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-18 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443573#comment-16443573 ] Liang-Chi Hsieh commented on SPARK-23711: - About this, I suppose that in

[jira] [Created] (SPARK-24014) Add onStreamingStarted method to StreamingListener

2018-04-18 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-24014: --- Summary: Add onStreamingStarted method to StreamingListener Key: SPARK-24014 URL: https://issues.apache.org/jira/browse/SPARK-24014 Project: Spark

[jira] [Commented] (SPARK-23904) Big execution plan cause OOM

2018-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438189#comment-16438189 ] Liang-Chi Hsieh commented on SPARK-23904: - If you don't need UI, can you try to set

[jira] [Commented] (SPARK-23970) pyspark - simple filter/select doesn't use all tasks when coalesce is set

2018-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438132#comment-16438132 ] Liang-Chi Hsieh commented on SPARK-23970: - I think the document of {{coalesce}} might answer

[jira] [Created] (SPARK-23979) MultiAlias should not be a CodegenFallback

2018-04-13 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23979: --- Summary: MultiAlias should not be a CodegenFallback Key: SPARK-23979 URL: https://issues.apache.org/jira/browse/SPARK-23979 Project: Spark Issue Type:

[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array

2018-04-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437102#comment-16437102 ] Liang-Chi Hsieh commented on SPARK-23928: - Hi [~hzlu], So will you take this one? > High-order

[jira] [Commented] (SPARK-23928) High-order function: shuffle(x) → array

2018-04-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433406#comment-16433406 ] Liang-Chi Hsieh commented on SPARK-23928: - If no assignee and no one announces, it is no problem

[jira] [Updated] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liang-Chi Hsieh updated SPARK-23875: Description: We don't have a good way to sequentially access {{UnsafeArrayData}} with a

[jira] [Created] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-05 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23875: --- Summary: Create IndexedSeq wrapper for ArrayData Key: SPARK-23875 URL: https://issues.apache.org/jira/browse/SPARK-23875 Project: Spark Issue Type:

[jira] [Created] (SPARK-23873) Use accessors in interpreted LambdaVariable

2018-04-04 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-23873: --- Summary: Use accessors in interpreted LambdaVariable Key: SPARK-23873 URL: https://issues.apache.org/jira/browse/SPARK-23873 Project: Spark Issue

[jira] [Commented] (SPARK-23661) Implement treeAggregate on Dataset API

2018-03-31 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421348#comment-16421348 ] Liang-Chi Hsieh commented on SPARK-23661: - For the implementation of {{Dataset.treeAggregate}},

[jira] [Commented] (SPARK-23835) When Dataset.as converts column from nullable to non-nullable type, null Doubles are converted silently to -1

2018-03-31 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421346#comment-16421346 ] Liang-Chi Hsieh commented on SPARK-23835: - What is the better behavior it should have? > When

<    1   2   3   4   5   6   7   8   9   10   >