[jira] [Commented] (SPARK-26645) CSV infer schema bug infers decimal(9,-1)

2020-11-25 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-26645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238798#comment-17238798 ] Punit Shah commented on SPARK-26645: Hello [~dongjoon] If we can get this PR then this would be

[jira] [Commented] (SPARK-33445) Can't parse decimal type from csv file

2020-11-18 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234919#comment-17234919 ] Punit Shah commented on SPARK-33445: Thank you very much [~dongjoon] > Can't parse decimal type

[jira] [Commented] (SPARK-33445) Can't parse decimal type from csv file

2020-11-18 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17234851#comment-17234851 ] Punit Shah commented on SPARK-33445: My apologies [~dongjoon] for the incorrect tags.  Please let me

[jira] [Comment Edited] (SPARK-33445) Can't parse decimal type from csv file

2020-11-17 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233589#comment-17233589 ] Punit Shah edited comment on SPARK-33445 at 11/17/20, 1:38 PM: ---

[jira] [Reopened] (SPARK-33445) Can't parse decimal type from csv file

2020-11-17 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah reopened SPARK-33445: As per the issue description, the call to spark_session.schema results in error.  Not

[jira] [Updated] (SPARK-33445) Can't parse decimal type from csv file

2020-11-13 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33445: --- Attachment: tsd.csv > Can't parse decimal type from csv file >

[jira] [Created] (SPARK-33445) Can't parse decimal type from csv file

2020-11-13 Thread Punit Shah (Jira)
Punit Shah created SPARK-33445: -- Summary: Can't parse decimal type from csv file Key: SPARK-33445 URL: https://issues.apache.org/jira/browse/SPARK-33445 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Description: The attached csv file has two columns, namely "User" and "FromDate".  The import

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Description: The attached csv file has two columns, namely "User" and "FromDate".  The import

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Description: The attached csv file has two columns, namely "User" and "FromDate".  The import

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Description: The attached csv file has two columns, namely "User" and "FromDate".  The import

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Description: The attached csv file has two columns, namely "User" and "FromDate".  The import

[jira] [Commented] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-05 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17226712#comment-17226712 ] Punit Shah commented on SPARK-33327: The correct behaviour of running the query should be: cnt,

[jira] [Updated] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-03 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-33327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-33327: --- Attachment: users.csv > grouped by first and last against date column returns incorrect results >

[jira] [Created] (SPARK-33327) grouped by first and last against date column returns incorrect results

2020-11-03 Thread Punit Shah (Jira)
Punit Shah created SPARK-33327: -- Summary: grouped by first and last against date column returns incorrect results Key: SPARK-33327 URL: https://issues.apache.org/jira/browse/SPARK-33327 Project: Spark

[jira] [Reopened] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-10-01 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah reopened SPARK-32965: The linked duplicate issue won't be fixed because the issue was mixed with a multiline feature

[jira] [Commented] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200743#comment-17200743 ] Punit Shah commented on SPARK-32965: It looks similar.  I've attached a utf-16le file to this

[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-32965: --- Attachment: 32965.png > pyspark reading csv files with utf_16le encoding >

[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah updated SPARK-32965: --- Attachment: 16le.csv > pyspark reading csv files with utf_16le encoding >

[jira] [Created] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-22 Thread Punit Shah (Jira)
Punit Shah created SPARK-32965: -- Summary: pyspark reading csv files with utf_16le encoding Key: SPARK-32965 URL: https://issues.apache.org/jira/browse/SPARK-32965 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-32956) Duplicate Columns in a csv file

2020-09-22 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200163#comment-17200163 ] Punit Shah commented on SPARK-32956: That may work > Duplicate Columns in a csv file >

[jira] [Created] (SPARK-32956) Duplicate Columns in a csv file

2020-09-21 Thread Punit Shah (Jira)
Punit Shah created SPARK-32956: -- Summary: Duplicate Columns in a csv file Key: SPARK-32956 URL: https://issues.apache.org/jira/browse/SPARK-32956 Project: Spark Issue Type: Bug

[jira] [Closed] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Punit Shah closed SPARK-32888. -- Resolved by adding documentation > reading a parallized rdd with two identical records results in a zero

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197069#comment-17197069 ] Punit Shah commented on SPARK-32888: Thank you for your reply [~viirya]  However what I've noticed

[jira] [Comment Edited] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196985#comment-17196985 ] Punit Shah edited comment on SPARK-32888 at 9/16/20, 2:55 PM: -- Why do we

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)
[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196985#comment-17196985 ] Punit Shah commented on SPARK-32888: Why do we remove lines that are the same as the header? The

[jira] [Created] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Punit Shah (Jira)
Punit Shah created SPARK-32888: -- Summary: reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv Key: SPARK-32888 URL: