[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2022-02-22 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496142#comment-17496142 ] Li Jin commented on SPARK-22947: For those who are interested, I will no longer on work on this SPIP.

[jira] [Commented] (SPARK-33057) Cannot use filter with window operations

2020-10-16 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215468#comment-17215468 ] Li Jin commented on SPARK-33057: I agree this is an improvement rather than a bug. Although, I am not

[jira] [Updated] (SPARK-33057) Cannot use filter with window operations

2020-10-02 Thread Li Jin (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-33057: --- Description: Current, trying to use filter with a window operations will fail:   {code:java} df =

[jira] [Created] (SPARK-33057) Cannot use filter with window operations

2020-10-02 Thread Li Jin (Jira)
Li Jin created SPARK-33057: -- Summary: Cannot use filter with window operations Key: SPARK-33057 URL: https://issues.apache.org/jira/browse/SPARK-33057 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896132#comment-16896132 ] Li Jin edited comment on SPARK-28482 at 7/30/19 1:28 PM: - Hi [~jiangyu1211],

[jira] [Comment Edited] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896132#comment-16896132 ] Li Jin edited comment on SPARK-28482 at 7/30/19 1:28 PM: - Hi [~jiangyu1211],

[jira] [Commented] (SPARK-28482) Data incomplete when using pandas udf in Python 3

2019-07-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896132#comment-16896132 ] Li Jin commented on SPARK-28482: Hi [~jiangyu1211], thank you for the bug report.   >From the

[jira] [Commented] (SPARK-28502) Error with struct conversion while using pandas_udf

2019-07-26 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16894134#comment-16894134 ] Li Jin commented on SPARK-28502: Hmm.. I think this has sth to do with timezone, can you try setting the 

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Description:   {code:java} @pandas_udf('double', PandasUDFType.GROUPED_AGG) def max_udf(v): return

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Summary: GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause (was: GROUPED_AGG

[jira] [Updated] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql() without group by clause

2019-07-17 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28422: --- Description:   {code:java} @pandas_udf('double', PandasUDFType.GROUPED_AGG) def max_udf(v): return

[jira] [Created] (SPARK-28422) GROUPED_AGG pandas_udf doesn't with spark.sql without group by clause

2019-07-17 Thread Li Jin (JIRA)
Li Jin created SPARK-28422: -- Summary: GROUPED_AGG pandas_udf doesn't with spark.sql without group by clause Key: SPARK-28422 URL: https://issues.apache.org/jira/browse/SPARK-28422 Project: Spark

[jira] [Comment Edited] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863186#comment-16863186 ] Li Jin edited comment on SPARK-28006 at 6/13/19 3:36 PM: - Hi [~viirya] good

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863186#comment-16863186 ] Li Jin commented on SPARK-28006: Hi [~viirya] good questions: >> Can we use pandas agg udfs as window

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16863177#comment-16863177 ] Li Jin commented on SPARK-27463: Yeah I think the exact spelling of the API can go either way. I think

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862505#comment-16862505 ] Li Jin commented on SPARK-28006: Thanks [~hyukjin.kwon] for the comments! I updated the description to

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862287#comment-16862287 ] Li Jin commented on SPARK-27463: I think one way to design this API to mimic the existing dataset

[jira] [Commented] (SPARK-27463) Support Dataframe Cogroup via Pandas UDFs

2019-06-12 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16862225#comment-16862225 ] Li Jin commented on SPARK-27463: For cogroup, I don't think there is analogous API in pandas. There is

[jira] [Commented] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861559#comment-16861559 ] Li Jin commented on SPARK-28006: cc [~hyukjin.kwon] [~LI,Xiao] [~ueshin] [~bryanc] I think code wise

[jira] [Updated] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28006: --- Description: Currently, pandas_udf supports "grouped aggregate" type that can be used with unbounded and

[jira] [Created] (SPARK-28006) User-defined grouped transform pandas_udf for window operations

2019-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-28006: -- Summary: User-defined grouped transform pandas_udf for window operations Key: SPARK-28006 URL: https://issues.apache.org/jira/browse/SPARK-28006 Project: Spark Issue

[jira] [Updated] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28003: --- Affects Version/s: (was: 2.3.2) 2.3.3 > spark.createDataFrame with Arrow doesn't

[jira] [Created] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-28003: -- Summary: spark.createDataFrame with Arrow doesn't work with pandas.NaT Key: SPARK-28003 URL: https://issues.apache.org/jira/browse/SPARK-28003 Project: Spark Issue

[jira] [Updated] (SPARK-28003) spark.createDataFrame with Arrow doesn't work with pandas.NaT

2019-06-11 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-28003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-28003: --- Affects Version/s: (was: 2.4.0) 2.3.2 2.4.3 >

[jira] [Commented] (SPARK-27538) sparksql could not start in jdk11, exception org.datanucleus.exceptions.NucleusException: The java type java.lang.Long (jdbc-type='', sql-type="") cant be mapped for t

2019-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-27538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16851185#comment-16851185 ] Li Jin commented on SPARK-27538: [~hyukjin.kwon] I saw you closed this. I wonder if this should be a sub

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726824#comment-16726824 ] Li Jin commented on SPARK-26410: One thing we want think about is whether or not to mix different size

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726820#comment-16726820 ] Li Jin commented on SPARK-26410: Thanks for the explanation. I think it makes sense to have batch size

[jira] [Commented] (SPARK-26412) Allow Pandas UDF to take an iterator of pd.DataFrames for the entire partition

2018-12-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725957#comment-16725957 ] Li Jin commented on SPARK-26412: So this is similar to the mapPartitions API in Scala but instead of

[jira] [Commented] (SPARK-26410) Support per Pandas UDF configuration

2018-12-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16725948#comment-16725948 ] Li Jin commented on SPARK-26410: I am curious why would user want to configure maxRecordsPerBatch? As

[jira] [Created] (SPARK-26364) Clean up import statements in pandas udf tests

2018-12-13 Thread Li Jin (JIRA)
Li Jin created SPARK-26364: -- Summary: Clean up import statements in pandas udf tests Key: SPARK-26364 URL: https://issues.apache.org/jira/browse/SPARK-26364 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-26328) Use GenerateOrdering for group key comparison in WindowExec

2018-12-10 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin resolved SPARK-26328. Resolution: Not A Problem > Use GenerateOrdering for group key comparison in WindowExec >

[jira] [Created] (SPARK-26328) Use GenerateOrdering for group key comparison in WindowExec

2018-12-10 Thread Li Jin (JIRA)
Li Jin created SPARK-26328: -- Summary: Use GenerateOrdering for group key comparison in WindowExec Key: SPARK-26328 URL: https://issues.apache.org/jira/browse/SPARK-26328 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25640) Clarify/Improve EvalType for grouped aggregate and window aggregate

2018-10-04 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25640: --- Description: Currently, grouped aggregate and window aggregate uses different EvalType, however, they map

[jira] [Created] (SPARK-25640) Clarify/Improve EvalType for grouped aggregate and window aggregate

2018-10-04 Thread Li Jin (JIRA)
Li Jin created SPARK-25640: -- Summary: Clarify/Improve EvalType for grouped aggregate and window aggregate Key: SPARK-25640 URL: https://issues.apache.org/jira/browse/SPARK-25640 Project: Spark

[jira] [Commented] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-28 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594973#comment-16594973 ] Li Jin commented on SPARK-25213: This is resolved by https://github.com/apache/spark/pull/22104 >

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing

[jira] [Updated] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-25216: --- Description: The current error message is  often confusing to a new Spark user that a column containing

[jira] [Created] (SPARK-25216) Provide better error message when a column contains dot and needs backticks quote

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25216: -- Summary: Provide better error message when a column contains dot and needs backticks quote Key: SPARK-25216 URL: https://issues.apache.org/jira/browse/SPARK-25216 Project: Spark

[jira] [Created] (SPARK-25213) DataSourceV2 doesn't seem to produce unsafe rows

2018-08-23 Thread Li Jin (JIRA)
Li Jin created SPARK-25213: -- Summary: DataSourceV2 doesn't seem to produce unsafe rows Key: SPARK-25213 URL: https://issues.apache.org/jira/browse/SPARK-25213 Project: Spark Issue Type: Task

[jira] [Commented] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579965#comment-16579965 ] Li Jin commented on SPARK-24561: I am looking into this. Early investigation: 

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Component/s: SQL > Failed to use PythonUDF with literal inputs in filter with data sources >

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Issue Type: Bug (was: Sub-task) Parent: (was: SPARK-22216) > Failed to use PythonUDF with

[jira] [Comment Edited] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579956#comment-16579956 ] Li Jin edited comment on SPARK-24721 at 8/14/18 3:26 PM: - Updated Jira title to

[jira] [Commented] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579956#comment-16579956 ] Li Jin commented on SPARK-24721: Updates Jira title to reflect the actual issue > Failed to use

[jira] [Updated] (SPARK-24721) Failed to use PythonUDF with literal inputs in filter with data sources

2018-08-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Summary: Failed to use PythonUDF with literal inputs in filter with data sources (was: Failed to call

[jira] [Comment Edited] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560337#comment-16560337 ] Li Jin edited comment on SPARK-24721 at 7/27/18 9:18 PM: - I think the issue is

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560337#comment-16560337 ] Li Jin commented on SPARK-24721: I think the issue is the UDF is being pushed down to the

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-27 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16560283#comment-16560283 ] Li Jin commented on SPARK-24721: {code:java} from pyspark.sql.functions import udf, lit, col

[jira] [Updated] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24624: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Can not mix vectorized and

[jira] [Updated] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24721: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Failed to call PythonUDF whose input

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544656#comment-16544656 ] Li Jin commented on SPARK-24721: I am currently traveling but will try to take a look when I get back >

[jira] [Updated] (SPARK-24796) Support GROUPED_AGG_PANDAS_UDF in Pivot

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24796: --- Issue Type: Sub-task (was: Improvement) Parent: SPARK-22216 > Support GROUPED_AGG_PANDAS_UDF in

[jira] [Commented] (SPARK-24796) Support GROUPED_AGG_PANDAS_UDF in Pivot

2018-07-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544655#comment-16544655 ] Li Jin commented on SPARK-24796: Sorry I am traveling now but I will try to take a look when I get back

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-09 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537662#comment-16537662 ] Li Jin commented on SPARK-24760: I think the issue here is that the output schema for the UDF is not

[jira] [Commented] (SPARK-24721) Failed to call PythonUDF whose input is the output of another PythonUDF

2018-07-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530462#comment-16530462 ] Li Jin commented on SPARK-24721: Yep I can take a look > Failed to call PythonUDF whose input is the

[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2018-06-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519590#comment-16519590 ] Li Jin commented on SPARK-24624: I can take a look at this one > Can not mix vectorized and

[jira] [Comment Edited] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515879#comment-16515879 ] Li Jin edited comment on SPARK-24578 at 6/18/18 3:24 PM: - cc @gatorsmile

[jira] [Commented] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16515879#comment-16515879 ] Li Jin commented on SPARK-24578: cc @gatorsmile We found this when switching from 2.2.1 to 2.3.0 in one

[jira] [Updated] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue

2018-06-18 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24578: --- Component/s: (was: Input/Output) Spark Core > Reading remote cache block behavior

[jira] [Commented] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16512844#comment-16512844 ] Li Jin commented on SPARK-24563: Will submit a PR soon > Allow running PySpark shell without Hive >

[jira] [Updated] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24563: --- Description: A previous commit: 

[jira] [Created] (SPARK-24563) Allow running PySpark shell without Hive

2018-06-14 Thread Li Jin (JIRA)
Li Jin created SPARK-24563: -- Summary: Allow running PySpark shell without Hive Key: SPARK-24563 URL: https://issues.apache.org/jira/browse/SPARK-24563 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-22239) User-defined window functions with pandas udf (unbounded window)

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-22239: --- Description: Window function is another place we can benefit from vectored udf and add another useful

[jira] [Created] (SPARK-24561) User-defined window functions with pandas udf (bounded window)

2018-06-14 Thread Li Jin (JIRA)
Li Jin created SPARK-24561: -- Summary: User-defined window functions with pandas udf (bounded window) Key: SPARK-24561 URL: https://issues.apache.org/jira/browse/SPARK-24561 Project: Spark Issue

[jira] [Updated] (SPARK-22239) User-defined window functions with pandas udf (unbounded window)

2018-06-14 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-22239: --- Summary: User-defined window functions with pandas udf (unbounded window) (was: User-defined window

[jira] [Commented] (SPARK-22239) User-defined window functions with pandas udf

2018-06-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16511495#comment-16511495 ] Li Jin commented on SPARK-22239: [~hyukjin.kwon] I actually don't think this Jira is done. The PR only

[jira] [Created] (SPARK-24521) Fix ineffective test in CachedTableSuite

2018-06-11 Thread Li Jin (JIRA)
Li Jin created SPARK-24521: -- Summary: Fix ineffective test in CachedTableSuite Key: SPARK-24521 URL: https://issues.apache.org/jira/browse/SPARK-24521 Project: Spark Issue Type: Test

[jira] [Commented] (SPARK-24258) SPIP: Improve PySpark support for ML Matrix and Vector types

2018-06-08 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506399#comment-16506399 ] Li Jin commented on SPARK-24258: I ran into [~mengxr] and chatted about this. Seems a good first step is

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-30 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495695#comment-16495695 ] Li Jin commented on SPARK-24373: [~smilegator] Thank you for the suggestion. > "df.cache() df.count()"

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493815#comment-16493815 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:34 PM: - Hi [~TomaszGaweda]

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493815#comment-16493815 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:34 PM: - Hi [~TomaszGaweda]

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16493815#comment-16493815 ] Li Jin commented on SPARK-22947: Hi [~TomaszGaweda] thanks for your interest! Yes I am willing to work

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-05-29 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449935#comment-16449935 ] Li Jin edited comment on SPARK-22947 at 5/29/18 4:33 PM: - I came across this

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data when the analyzed plans are different after re-analyzing the plans

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491309#comment-16491309 ] Li Jin commented on SPARK-24373: [~smilegator] do you mean that add AnalysisBarrier to 

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491125#comment-16491125 ] Li Jin commented on SPARK-24324: Moved under Spark-22216 for better ticket organization. > Pandas

[jira] [Updated] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24324: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-22216 > Pandas Grouped Map UserDefinedFunction mixes

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491086#comment-16491086 ] Li Jin commented on SPARK-24373: We use groupby() and pivot() > "df.cache() df.count()" no longer

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is a reproduce:

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 9:00 PM: - This is a reproduce:

[jira] [Comment Edited] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin edited comment on SPARK-24373 at 5/24/18 8:51 PM: - This is a reproduce:

[jira] [Commented] (SPARK-24324) Pandas Grouped Map UserDefinedFunction mixes column labels

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489759#comment-16489759 ] Li Jin commented on SPARK-24324: This is a dup of https://issues.apache.org/jira/browse/SPARK-23929, I am

[jira] [Commented] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16489060#comment-16489060 ] Li Jin commented on SPARK-24373: This is a reproduce in unit test: {code:java} test("cache and count") {

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Summary: "df.cache() df.count()" no longer eagerly caches data (was: Spark Dataset groupby.agg/count

[jira] [Comment Edited] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488109#comment-16488109 ] Li Jin edited comment on SPARK-24373 at 5/23/18 11:18 PM: -- We found after

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Component/s: (was: Input/Output) SQL > Spark Dataset groupby.agg/count doesn't respect

[jira] [Commented] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488109#comment-16488109 ] Li Jin commented on SPARK-24373: I think this might be a regression from 2.2 Any one uses  "df.cache()

[jira] [Commented] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-22 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484086#comment-16484086 ] Li Jin commented on SPARK-24334: [~pi3ni0] did it happen for you when your UDF throws exception? > Race

[jira] [Commented] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-21 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483004#comment-16483004 ] Li Jin commented on SPARK-24334: I have done some investigation and will submit a PR soon. > Race

[jira] [Created] (SPARK-24334) Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator

2018-05-21 Thread Li Jin (JIRA)
Li Jin created SPARK-24334: -- Summary: Race condition in ArrowPythonRunner causes unclean shutdown of Arrow memory allocator Key: SPARK-24334 URL: https://issues.apache.org/jira/browse/SPARK-24334 Project:

[jira] [Commented] (SPARK-22239) User-defined window functions with pandas udf

2018-04-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452977#comment-16452977 ] Li Jin commented on SPARK-22239: [~hvanhovell], I have done a bit further research of UDF over rolling

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-25 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452355#comment-16452355 ] Li Jin commented on SPARK-23929: [~tr3w] does using OrderedDict help in your case? > pandas_udf schema

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-04-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449935#comment-16449935 ] Li Jin edited comment on SPARK-22947 at 4/24/18 2:16 PM: - I came across this blog

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-04-24 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449935#comment-16449935 ] Li Jin commented on SPARK-22947: I came across this blog today:

[jira] [Updated] (SPARK-24019) AnalysisException for Window function expression to compute derivative

2018-04-20 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24019: --- Component/s: (was: Spark Core) SQL > AnalysisException for Window function expression

[jira] [Comment Edited] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438935#comment-16438935 ] Li Jin edited comment on SPARK-23929 at 4/16/18 2:40 AM: - I agree with

[jira] [Commented] (SPARK-23929) pandas_udf schema mapped by position and not by name

2018-04-15 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438935#comment-16438935 ] Li Jin commented on SPARK-23929: I agree with [~hyukjin.kwon]. Seems like there is not a strong enough

[jira] [Commented] (SPARK-23030) Decrease memory consumption with toPandas() collection using Arrow

2018-04-13 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438001#comment-16438001 ] Li Jin commented on SPARK-23030: Hey [~bryanc], did you by an chance have some process on this? I guess

  1   2   3   >