[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-03-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15890588#comment-15890588 ] Nicholas Chammas commented on SPARK-19578: -- [~holdenk] - Would it make sense to have PySpark's

[jira] [Commented] (SPARK-18381) Wrong date conversion between spark and python for dates before 1583

2017-02-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888776#comment-15888776 ] Nicholas Chammas commented on SPARK-18381: -- Oh, and to provide additional information on why

[jira] [Commented] (SPARK-18381) Wrong date conversion between spark and python for dates before 1583

2017-02-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888752#comment-15888752 ] Nicholas Chammas commented on SPARK-18381: -- I am seeing a very similar issue when trying to read

[jira] [Commented] (SPARK-19553) Add GroupedData.countApprox()

2017-02-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15870780#comment-15870780 ] Nicholas Chammas commented on SPARK-19553: -- The utility of 1) would be being able to count items

[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-02-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864097#comment-15864097 ] Nicholas Chammas commented on SPARK-19578: -- I'm seeing the same thing too. You can get a much

[jira] [Commented] (SPARK-19553) Add GroupedData.countApprox()

2017-02-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864000#comment-15864000 ] Nicholas Chammas commented on SPARK-19553: -- Quick API question for you [~marmbrus]: Is this

[jira] [Commented] (SPARK-19553) Add GroupedData.countApprox()

2017-02-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861735#comment-15861735 ] Nicholas Chammas commented on SPARK-19553: -- I needed something like this today. I was profiling

[jira] [Created] (SPARK-19553) Add GroupedData.countApprox()

2017-02-10 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19553: Summary: Add GroupedData.countApprox() Key: SPARK-19553 URL: https://issues.apache.org/jira/browse/SPARK-19553 Project: Spark Issue Type:

[jira] (SPARK-12559) Standalone cluster mode doesn't work with --packages

2017-01-30 Thread Nicholas Chammas (JIRA)
Title: Message Title Nicholas Chammas commented on SPARK-12559

[jira] [Commented] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826819#comment-15826819 ] Nicholas Chammas commented on SPARK-19216: -- Ah, thanks. I suppose this should become a sub-task

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824751#comment-15824751 ] Nicholas Chammas commented on SPARK-19217: -- Ah OK, good to know. I was testing with 2.0.2, which

[jira] [Reopened] (SPARK-2141) Add sc.getPersistentRDDs() to PySpark

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-2141: - > Add sc.getPersistentRDDs() to PySpark > - > >

[jira] [Commented] (SPARK-2141) Add sc.getPersistentRDDs() to PySpark

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824360#comment-15824360 ] Nicholas Chammas commented on SPARK-2141: - I'd like to reopen this issue given the fact that the

[jira] [Commented] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824334#comment-15824334 ] Nicholas Chammas commented on SPARK-19248: -- Testing this out, it looks like 2.1 shows the same

[jira] [Updated] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19217: - Description: Working with ML often means working with DataFrames with vector columns.

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824175#comment-15824175 ] Nicholas Chammas commented on SPARK-19217: -- [~mlnick] - I'm seeing this when I try to write as

[jira] [Comment Edited] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824175#comment-15824175 ] Nicholas Chammas edited comment on SPARK-19217 at 1/16/17 3:41 PM: ---

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822940#comment-15822940 ] Nicholas Chammas commented on SPARK-18492: -- Actually, on second look, I'm not entirely sure

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822939#comment-15822939 ] Nicholas Chammas commented on SPARK-18492: -- Oh, it looks like this issue is duplicated by

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822934#comment-15822934 ] Nicholas Chammas commented on SPARK-18492: -- I suppose the "correct" solution is to make code

[jira] [Created] (SPARK-19217) Offer easy cast from vector to array

2017-01-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19217: Summary: Offer easy cast from vector to array Key: SPARK-19217 URL: https://issues.apache.org/jira/browse/SPARK-19217 Project: Spark Issue Type:

[jira] [Commented] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822094#comment-15822094 ] Nicholas Chammas commented on SPARK-19216: -- cc [~josephkb] - Is this a valid gap in Python's

[jira] [Created] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19216: Summary: LogisticRegressionModel is missing getThreshold() Key: SPARK-19216 URL: https://issues.apache.org/jira/browse/SPARK-19216 Project: Spark

[jira] [Created] (SPARK-19106) Styling for the configuration docs is broken

2017-01-06 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19106: Summary: Styling for the configuration docs is broken Key: SPARK-19106 URL: https://issues.apache.org/jira/browse/SPARK-19106 Project: Spark Issue

[jira] [Updated] (SPARK-19106) Styling for the configuration docs is broken

2017-01-06 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19106: - Attachment: Screen Shot 2017-01-06 at 10.20.52 AM.png > Styling for the configuration

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2017-01-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796227#comment-15796227 ] Nicholas Chammas commented on SPARK-18866: -- Could be. I guess the issue of aliasing somehow

[jira] [Commented] (SPARK-16402) JDBC source: Implement save API

2016-12-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785990#comment-15785990 ] Nicholas Chammas commented on SPARK-16402: -- [~JustinPihony], [~smilegator] - Does the resolution

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-12-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762706#comment-15762706 ] Nicholas Chammas commented on SPARK-18492: -- Yup, I'm seeming the same high-level behavior as

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-12-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755742#comment-15755742 ] Nicholas Chammas commented on SPARK-18492: -- I'm hitting this problem as well when I try to apply

[jira] [Updated] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18866: - Description: Here's a minimal repro: {code} import pyspark from pyspark.sql import

[jira] [Updated] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18866: - Description: Here's a minimal repro: {code} import pyspark from pyspark.sql import

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749815#comment-15749815 ] Nicholas Chammas commented on SPARK-18866: -- cc [~hvanhovell] > Codegen fails with cryptic error

[jira] [Created] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18866: Summary: Codegen fails with cryptic error if regexp_replace() output column is not aliased Key: SPARK-18866 URL: https://issues.apache.org/jira/browse/SPARK-18866

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-12-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740419#comment-15740419 ] Nicholas Chammas commented on SPARK-13587: -- Thanks to a lot of help from [~quasi...@gmail.com]

[jira] [Created] (SPARK-18818) Window...orderBy() should accept an 'ascending' parameter just like DataFrame.orderBy()

2016-12-10 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18818: Summary: Window...orderBy() should accept an 'ascending' parameter just like DataFrame.orderBy() Key: SPARK-18818 URL: https://issues.apache.org/jira/browse/SPARK-18818

[jira] [Commented] (SPARK-14932) Allow DataFrame.replace() to replace values with None

2016-12-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735525#comment-15735525 ] Nicholas Chammas commented on SPARK-14932: -- My goal is to be able to do something like this:

[jira] [Created] (SPARK-18719) Document spark.ui.showConsoleProgress

2016-12-05 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18719: Summary: Document spark.ui.showConsoleProgress Key: SPARK-18719 URL: https://issues.apache.org/jira/browse/SPARK-18719 Project: Spark Issue Type:

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-12-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713636#comment-15713636 ] Nicholas Chammas commented on SPARK-13587: -- [~tsp]: {quote} Previously, I have had reasonable

[jira] [Updated] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-11-30 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16589: - Labels: correctness (was: ) > Chained cartesian produces incorrect number of records >

[jira] [Commented] (SPARK-18589) persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child"

2016-11-25 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696717#comment-15696717 ] Nicholas Chammas commented on SPARK-18589: -- cc [~davies] [~hvanhovell] > persist() resolves

[jira] [Created] (SPARK-18589) persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child"

2016-11-25 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18589: Summary: persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child" Key: SPARK-18589 URL:

[jira] [Commented] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674980#comment-15674980 ] Nicholas Chammas commented on SPARK-18495: -- cc [~andrewor14] > Web UI should document meaning

[jira] [Created] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18495: Summary: Web UI should document meaning of green dot in DAG visualization Key: SPARK-18495 URL: https://issues.apache.org/jira/browse/SPARK-18495 Project:

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658284#comment-15658284 ] Nicholas Chammas commented on SPARK-18367: -- Looks like lowering {{bypassMergeThreshold}} even to

[jira] [Commented] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2016-11-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657285#comment-15657285 ] Nicholas Chammas commented on SPARK-18084: -- It sounds like from Michael's comment that this is a

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656254#comment-15656254 ] Nicholas Chammas commented on SPARK-18367: -- Ah, sounds like the correct explanation to me. So in

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656230#comment-15656230 ] Nicholas Chammas commented on SPARK-18367: -- How are you monitoring the number of open files? I

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656228#comment-15656228 ] Nicholas Chammas commented on SPARK-18367: -- Tomorrow I'll try running this on a Linux VM. Maybe

[jira] [Comment Edited] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656199#comment-15656199 ] Nicholas Chammas edited comment on SPARK-18367 at 11/11/16 5:30 AM:

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656199#comment-15656199 ] Nicholas Chammas commented on SPARK-18367: -- Tweaked repro script to show partitions before and

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656066#comment-15656066 ] Nicholas Chammas commented on SPARK-18367: -- I noticed that if I generate a DataFrame with fewer

[jira] [Comment Edited] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656019#comment-15656019 ] Nicholas Chammas edited comment on SPARK-18367 at 11/11/16 3:44 AM:

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: spark-lsof.txt Here is the output of {{lsof}} on all the pids owned by Spark

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655091#comment-15655091 ] Nicholas Chammas commented on SPARK-18367: -- I've updated the issue description with a minimal

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: (was: plan-with-limit.txt) > DataFrame join spawns unreasonably high

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: (was: plan-without-limit.txt) > DataFrame join spawns unreasonably high

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Summary: DataFrame join spawns unreasonably high number of open files (was: limit()

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655032#comment-15655032 ] Nicholas Chammas commented on SPARK-18367: -- Scratch that. This is not related to UDFs. >

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654770#comment-15654770 ] Nicholas Chammas commented on SPARK-18367: -- Looks like this is a fundamental problem with Python

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652968#comment-15652968 ] Nicholas Chammas commented on SPARK-18367: -- Even if I cut the number of records I'm processing

[jira] [Comment Edited] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652406#comment-15652406 ] Nicholas Chammas edited comment on SPARK-18367 at 11/10/16 3:24 AM:

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652881#comment-15652881 ] Nicholas Chammas commented on SPARK-18367: -- To provide some context, this code base I'm

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652406#comment-15652406 ] Nicholas Chammas commented on SPARK-18367: -- I've spent the day trying to narrow down what is

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649132#comment-15649132 ] Nicholas Chammas commented on SPARK-18367: -- On 2.0.x the caching is required due to SPARK-18254,

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649063#comment-15649063 ] Nicholas Chammas commented on SPARK-18367: -- I'm not trying to write any files actually. In this

[jira] [Updated] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a complex DataFrame query that fails to run normally but succeeds if

[jira] [Updated] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: plan-without-limit.txt plan-with-limit.txt > limit() makes

[jira] [Created] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18367: Summary: limit() makes the lame walk again Key: SPARK-18367 URL: https://issues.apache.org/jira/browse/SPARK-18367 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637713#comment-15637713 ] Nicholas Chammas commented on SPARK-18277: -- {quote} If you try {{when()}}, you realize that you

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637654#comment-15637654 ] Nicholas Chammas commented on SPARK-18277: -- Thanks for the pointer. I'll follow the discussion

[jira] [Comment Edited] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637566#comment-15637566 ] Nicholas Chammas edited comment on SPARK-18277 at 11/4/16 8:25 PM: ---

[jira] [Updated] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18277: - Description: It appears that you cannot use {{fill()}} and friends to quickly modify

[jira] [Commented] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637566#comment-15637566 ] Nicholas Chammas commented on SPARK-18277: -- [~marmbrus] / [~yhuai]: Is there is workaround for

[jira] [Updated] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18277: - Description: It appears that you cannot use {{fill()}} and friends to quickly modify

[jira] [Created] (SPARK-18277) na.fill() and friends should work on struct fields

2016-11-04 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18277: Summary: na.fill() and friends should work on struct fields Key: SPARK-18277 URL: https://issues.apache.org/jira/browse/SPARK-18277 Project: Spark

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636944#comment-15636944 ] Nicholas Chammas commented on SPARK-18128: -- For the record: Let's also check with the PyPI

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636508#comment-15636508 ] Nicholas Chammas commented on SPARK-18128: -- [~prabinb] - See [this

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636497#comment-15636497 ] Nicholas Chammas commented on SPARK-18128: -- For the record: A PyPI admin is looking into the

[jira] [Commented] (SPARK-18128) Add support for publishing to PyPI

2016-11-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636500#comment-15636500 ] Nicholas Chammas commented on SPARK-18128: -- [~holdenk] - Shall we make this issue a subtask of

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634455#comment-15634455 ] Nicholas Chammas commented on SPARK-18254: --    So it was specifically some broken

[jira] [Comment Edited] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634391#comment-15634391 ] Nicholas Chammas edited comment on SPARK-18254 at 11/3/16 9:58 PM: --- If

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634428#comment-15634428 ] Nicholas Chammas commented on SPARK-18254: -- Just tried it. Seems like the fix is only available

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634391#comment-15634391 ] Nicholas Chammas commented on SPARK-18254: -- If I try branch-2.1 on

[jira] [Comment Edited] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15634391#comment-15634391 ] Nicholas Chammas edited comment on SPARK-18254 at 11/3/16 9:46 PM: --- If

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633744#comment-15633744 ] Nicholas Chammas commented on SPARK-18254: -- Interestingly, if I add {{names_cleaned.persist()}}

[jira] [Comment Edited] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633446#comment-15633446 ] Nicholas Chammas edited comment on SPARK-18254 at 11/3/16 4:57 PM: ---

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633446#comment-15633446 ] Nicholas Chammas commented on SPARK-18254: -- Yes, if I don't alias the columns and/or update

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18254: - Description: Dunno if I'm misinterpreting something here, but this seems like a bug in

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633424#comment-15633424 ] Nicholas Chammas commented on SPARK-18254: -- Yep, it works fine if the column names haven't been

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18254: - Description: Dunno if I'm misinterpreting something here, but this seems like a bug in

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18254: - Description: Dunno if I'm misinterpreting something here, but this seems like a bug in

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18254: - Description: Dunno if I'm misinterpreting something here, but this seems like a bug in

[jira] [Updated] (SPARK-18254) UDFs don't see aliased column names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18254: - Summary: UDFs don't see aliased column names (was: UDFs don't see aliased column names;

[jira] [Commented] (SPARK-18254) UDFs don't see aliased column names; somehow they get the original names

2016-11-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633220#comment-15633220 ] Nicholas Chammas commented on SPARK-18254: -- [~marmbrus] / [~hvanhovell]: Is there a workaround

[jira] [Created] (SPARK-18254) UDFs don't see aliased column names; somehow they get the original names

2016-11-03 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18254: Summary: UDFs don't see aliased column names; somehow they get the original names Key: SPARK-18254 URL: https://issues.apache.org/jira/browse/SPARK-18254

[jira] [Commented] (SPARK-16726) Improve `Union/Intersect/Except` error messages on incompatible types

2016-11-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630597#comment-15630597 ] Nicholas Chammas commented on SPARK-16726: -- I just hit this error in 2.0.1 and it was this JIRA

[jira] [Commented] (SPARK-14900) spark.ml classification metrics should include accuracy

2016-10-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15618637#comment-15618637 ] Nicholas Chammas commented on SPARK-14900: -- I don't know if this belongs in a separate issue, or

<    1   2   3   4   5   6   7   8   9   10   >