subject:"\[jira\] \[Commented\] \(SPARK\-32888\) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv"

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread L. C. Hsieh (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197073#comment-17197073 ] L. C. Hsieh commented on SPARK-32888: - Yes, there is difference. But it is due to reading file and

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197069#comment-17197069 ] Punit Shah commented on SPARK-32888: Thank you for your reply [~viirya] However what I've noticed

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread L. C. Hsieh (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197046#comment-17197046 ] L. C. Hsieh commented on SPARK-32888: - Reading csv files is simple. We can just remove first line.

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-16 Thread Punit Shah (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196985#comment-17196985 ] Punit Shah commented on SPARK-32888: Why do we remove lines that are the same as the header? The

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread Apache Spark (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196591#comment-17196591 ] Apache Spark commented on SPARK-32888: -- User 'viirya' has created a pull request for this issue:

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

2020-09-15 Thread L. C. Hsieh (Jira)

[ https://issues.apache.org/jira/browse/SPARK-32888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17196590#comment-17196590 ] L. C. Hsieh commented on SPARK-32888: - This was documented in CSV related codes, although it seems

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

[jira] [Commented] (SPARK-32888) reading a parallized rdd with two identical records results in a zero count df when read via spark.read.csv

6 matches

Site Navigation

Mail list logo

Footer information