[
https://issues.apache.org/jira/browse/SPARK-27224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801802#comment-16801802
]
Jurriaan Pruis commented on SPARK-27224:
[~hyukjin.kwon] is this the same as
[
https://issues.apache.org/jira/browse/SPARK-17914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801795#comment-16801795
]
Jurriaan Pruis commented on SPARK-17914:
I'm also seeing this issue where the millisecond part
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397223#comment-15397223
]
Jurriaan Pruis edited comment on SPARK-16753 at 7/28/16 8:02 AM:
-
[~rxin]
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397223#comment-15397223
]
Jurriaan Pruis edited comment on SPARK-16753 at 7/28/16 8:02 AM:
-
[~rxin]
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-16753:
---
Attachment: screenshot-1.png
> Spark SQL doesn't handle skewed dataset joins properly
>
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397223#comment-15397223
]
Jurriaan Pruis commented on SPARK-16753:
[~rxin]
I've set the following options:
{code}
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396209#comment-15396209
]
Jurriaan Pruis commented on SPARK-16753:
It's not only more memory, they also take up a lot more
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396078#comment-15396078
]
Jurriaan Pruis commented on SPARK-16753:
This is looks like a skew problem to me since the tasks
[
https://issues.apache.org/jira/browse/SPARK-16753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15396053#comment-15396053
]
Jurriaan Pruis commented on SPARK-16753:
[~rxin] Do you know something about this? I've seen a
Jurriaan Pruis created SPARK-16753:
--
Summary: Spark SQL doesn't handle skewed dataset joins properly
Key: SPARK-16753
URL: https://issues.apache.org/jira/browse/SPARK-16753
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15353471#comment-15353471
]
Jurriaan Pruis commented on SPARK-16252:
Awesome! It works, indeed!
> Full Outer join with
[
https://issues.apache.org/jira/browse/SPARK-16252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-16252:
---
Description:
{code}
>>> from pyspark.sql.functions import lit, coalesce
>>> data1 = [[1,2],
Jurriaan Pruis created SPARK-16252:
--
Summary: Full Outer join with literal column results in incorrect
result
Key: SPARK-16252
URL: https://issues.apache.org/jira/browse/SPARK-16252
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15343775#comment-15343775
]
Jurriaan Pruis commented on SPARK-15326:
[~hvanhovell] unfortunately that doesn't work. The
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341703#comment-15341703
]
Jurriaan Pruis commented on SPARK-15393:
That's interesting because my example worked just fine
[
https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15307425#comment-15307425
]
Jurriaan Pruis commented on SPARK-15654:
You need to override maxSplitBytes, not
[
https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306866#comment-15306866
]
Jurriaan Pruis commented on SPARK-15654:
Sorry, not sure about other formats. So this is due to
[
https://issues.apache.org/jira/browse/SPARK-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306731#comment-15306731
]
Jurriaan Pruis commented on SPARK-15654:
cc [~davies] [~marmbrus] I saw you guys worked on code
Jurriaan Pruis created SPARK-15654:
--
Summary: Reading gzipped files results in duplicate rows
Key: SPARK-15654
URL: https://issues.apache.org/jira/browse/SPARK-15654
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305263#comment-15305263
]
Jurriaan Pruis commented on SPARK-13638:
[~rxin] Sure!
> Support for saving with a quote mode
>
[
https://issues.apache.org/jira/browse/SPARK-13638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300899#comment-15300899
]
Jurriaan Pruis commented on SPARK-13638:
[~rxin] I think having quoteAll on by default is a bit
[
https://issues.apache.org/jira/browse/SPARK-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15493:
---
Description:
See
Jurriaan Pruis created SPARK-15493:
--
Summary: Allow setting the quoteEscapingEnabled flag when writing
CSV
Key: SPARK-15493
URL: https://issues.apache.org/jira/browse/SPARK-15493
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295477#comment-15295477
]
Jurriaan Pruis commented on SPARK-15393:
[~hyukjin.kwon] I reproduced it again using pyspark
[
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15295471#comment-15295471
]
Jurriaan Pruis commented on SPARK-14343:
[~davies] This is still an issue on Spark 2.0 (You only
[
https://issues.apache.org/jira/browse/SPARK-15415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15292828#comment-15292828
]
Jurriaan Pruis commented on SPARK-15415:
[~rxin] I could try to work on that. The reason I ran
Jurriaan Pruis created SPARK-15415:
--
Summary: Marking partitions for broadcast broken
Key: SPARK-15415
URL: https://issues.apache.org/jira/browse/SPARK-15415
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15393:
---
Description:
Writing empty dataframes is broken on latest master.
It omits the metadata and
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15393:
---
Description:
Writing empty dataframes is broken on latest master.
It omits the metadata and
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15393:
---
Summary: Writing empty Dataframes doesn't save any _metadata files (was:
Writing empty
[
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289797#comment-15289797
]
Jurriaan Pruis commented on SPARK-15393:
Ping [~hyukjin.kwon]
> Writing empty Dataframes broken
Jurriaan Pruis created SPARK-15393:
--
Summary: Writing empty Dataframes broken
Key: SPARK-15393
URL: https://issues.apache.org/jira/browse/SPARK-15393
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286236#comment-15286236
]
Jurriaan Pruis commented on SPARK-14959:
As you can see in the description writing is also broken
[
https://issues.apache.org/jira/browse/SPARK-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283819#comment-15283819
]
Jurriaan Pruis commented on SPARK-15327:
Did a quick look for the cause of this problem and it
[
https://issues.apache.org/jira/browse/SPARK-15327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15327:
---
Attachment: full_exception.txt
See attached file for the full exception / generated code.
>
Jurriaan Pruis created SPARK-15327:
--
Summary: Catalyst code generation fails with complex data structure
Key: SPARK-15327
URL: https://issues.apache.org/jira/browse/SPARK-15327
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Summary: Doing multiple unions on a Dataframe will result in a very
inefficient query plan
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Attachment: Query Plan.pdf
Also added a PDF of the Query Plan as shown in the web interface.
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Description:
While working with a very skewed dataset I noticed that repeated unions on a
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Comment: was deleted
(was: The example code)
> Doing multiple union on a Dataframe will
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Comment: was deleted
(was: The extended query plan generated by the example)
> Doing
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Attachment: skewed_join_plan.txt
The extended query plan generated by the example
> Doing
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Description:
While working with a very skewed dataset I noticed that repeated unions on a
[
https://issues.apache.org/jira/browse/SPARK-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15326:
---
Attachment: skewed_join.py
The example code
> Doing multiple union on a Dataframe will
Jurriaan Pruis created SPARK-15326:
--
Summary: Doing multiple union on a Dataframe will result in a very
inefficient query plan
Key: SPARK-15326
URL: https://issues.apache.org/jira/browse/SPARK-15326
[
https://issues.apache.org/jira/browse/SPARK-15323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15323:
---
Description:
{code}
sqlContext.read.format("text").load("...")
{code}
Is broken for
Jurriaan Pruis created SPARK-15323:
--
Summary: read with format=text is broken for partitioned tables in
Spark 2.0
Key: SPARK-15323
URL: https://issues.apache.org/jira/browse/SPARK-15323
Project:
[
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15283044#comment-15283044
]
Jurriaan Pruis commented on SPARK-14463:
Actually, this functionality is broken (explicitly
[
https://issues.apache.org/jira/browse/SPARK-14959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280895#comment-15280895
]
Jurriaan Pruis commented on SPARK-14959:
I have the same issue reading a partitioned parquet
[
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15271207#comment-15271207
]
Jurriaan Pruis commented on SPARK-14463:
Any idea if
[
https://issues.apache.org/jira/browse/SPARK-15127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-15127:
---
Description:
I think I found a bug in the way columns are handled in (py)Spark
h3. How to
Jurriaan Pruis created SPARK-15127:
--
Summary: Column names are handled incorrectly when they originate
from a single Dataframe
Key: SPARK-15127
URL: https://issues.apache.org/jira/browse/SPARK-15127
[
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-14343:
---
Environment: Mac OS X 10.11.4 / Ubuntu 16.04 LTS (was: Mac OS X 10.11.4)
> Dataframe
[
https://issues.apache.org/jira/browse/SPARK-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15246211#comment-15246211
]
Jurriaan Pruis commented on SPARK-14463:
Why? I guess this can be quite useful, at least while
[
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244669#comment-15244669
]
Jurriaan Pruis commented on SPARK-14343:
On the spark 2.0.0 nightly build it doesn't work at all:
[
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-14343:
---
Affects Version/s: 2.0.0
> Dataframe operations on a partitioned dataset (using partition
[
https://issues.apache.org/jira/browse/SPARK-14343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jurriaan Pruis updated SPARK-14343:
---
Description:
When reading a dataset using {{sqlContext.read.text()}} queries on the
Jurriaan Pruis created SPARK-14343:
--
Summary: Dataframe operations on a partitioned dataset (using
partition discovery) return invalid results
Key: SPARK-14343
URL:
58 matches
Mail list logo