[jira] [Commented] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-02-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864097#comment-15864097 ] Nicholas Chammas commented on SPARK-19578: -- I'm seeing the same thing too. You can get a much

[jira] [Commented] (SPARK-19553) Add GroupedData.countApprox()

2017-02-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15864000#comment-15864000 ] Nicholas Chammas commented on SPARK-19553: -- Quick API question for you [~marmbrus

Re: Order of rows not preserved after cache + count + coalesce

2017-02-13 Thread Nicholas Chammas
RDDs and DataFrames do not guarantee any specific ordering of data. They are like tables in a SQL database. The only way to get a guaranteed ordering of rows is to explicitly specify an orderBy() clause in your statement. Any ordering you see otherwise is incidental. ​ On Mon, Feb 13, 2017 at

[jira] [Commented] (SPARK-19553) Add GroupedData.countApprox()

2017-02-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861735#comment-15861735 ] Nicholas Chammas commented on SPARK-19553: -- I needed something like this today. I was profiling

[jira] [Created] (SPARK-19553) Add GroupedData.countApprox()

2017-02-10 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19553: Summary: Add GroupedData.countApprox() Key: SPARK-19553 URL: https://issues.apache.org/jira/browse/SPARK-19553 Project: Spark Issue Type

[jira] (SPARK-12559) Standalone cluster mode doesn't work with --packages

2017-01-30 Thread Nicholas Chammas (JIRA)
Title: Message Title Nicholas Chammas commented on SPARK-12559

Re: Typo on spark.apache.org? "cyclic data flow"

2017-01-28 Thread Nicholas Chammas
Aye aye, cap'n. PR incoming. On Sat, Jan 28, 2017 at 2:44 PM Sean Owen <so...@cloudera.com> wrote: > Certainly a typo -- feel free to make a PR for the spark-website repo. > (Might search for other instances of 'cyclic' too) > > On Sat, Jan 28, 2017 at 7:18 P

Typo on spark.apache.org? "cyclic data flow"

2017-01-28 Thread Nicholas Chammas
The tagline on http://spark.apache.org/ says: "Apache Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing." Isn't that supposed to be "acyclic" rather than "cyclic"? What does it mean to support cyclic data flow anyway? Nick

Re: welcoming Burak and Holden as committers

2017-01-24 Thread Nicholas Chammas
  Congratulations, Burak and Holden. On Tue, Jan 24, 2017 at 1:27 PM Russell Spitzer wrote: > Great news! Congratulations! > > On Tue, Jan 24, 2017 at 10:25 AM Dean Wampler > wrote: > > Congratulations to both of you! > > dean > > *Dean

Re: Debugging a PythonException with no details

2017-01-17 Thread Nicholas Chammas
It seems it has to do with UDF..Could u share snippet of code you are > running? > Kr > > On 14 Jan 2017 1:40 am, "Nicholas Chammas" <nicholas.cham...@gmail.com> > wrote: > > I’m looking for tips on how to debug a PythonException that’s very sparse > on d

[jira] [Commented] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826819#comment-15826819 ] Nicholas Chammas commented on SPARK-19216: -- Ah, thanks. I suppose this should become a sub-task

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824751#comment-15824751 ] Nicholas Chammas commented on SPARK-19217: -- Ah OK, good to know. I was testing with 2.0.2, which

[jira] [Reopened] (SPARK-2141) Add sc.getPersistentRDDs() to PySpark

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas reopened SPARK-2141: - > Add sc.getPersistentRDDs() to PySp

[jira] [Commented] (SPARK-2141) Add sc.getPersistentRDDs() to PySpark

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824360#comment-15824360 ] Nicholas Chammas commented on SPARK-2141: - I'd like to reopen this issue given the fact

[jira] [Commented] (SPARK-19248) Regex_replace works in 1.6 but not in 2.0

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824334#comment-15824334 ] Nicholas Chammas commented on SPARK-19248: -- Testing this out, it looks like 2.1 shows the same

[jira] [Updated] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19217: - Description: Working with ML often means working with DataFrames with vector columns

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824175#comment-15824175 ] Nicholas Chammas commented on SPARK-19217: -- [~mlnick] - I'm seeing this when I try to write

[jira] [Comment Edited] (SPARK-19217) Offer easy cast from vector to array

2017-01-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824175#comment-15824175 ] Nicholas Chammas edited comment on SPARK-19217 at 1/16/17 3:41 PM

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822940#comment-15822940 ] Nicholas Chammas commented on SPARK-18492: -- Actually, on second look, I'm not entirely sure

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822939#comment-15822939 ] Nicholas Chammas commented on SPARK-18492: -- Oh, it looks like this issue is duplicated by SPARK

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2017-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822934#comment-15822934 ] Nicholas Chammas commented on SPARK-18492: -- I suppose the "correct" solution is to

Debugging a PythonException with no details

2017-01-13 Thread Nicholas Chammas
I’m looking for tips on how to debug a PythonException that’s very sparse on details. The full exception is below, but the only interesting bits appear to be the following lines: org.apache.spark.api.python.PythonException: ... py4j.protocol.Py4JError: An error occurred while calling

[jira] [Created] (SPARK-19217) Offer easy cast from vector to array

2017-01-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19217: Summary: Offer easy cast from vector to array Key: SPARK-19217 URL: https://issues.apache.org/jira/browse/SPARK-19217 Project: Spark Issue Type

[jira] [Commented] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822094#comment-15822094 ] Nicholas Chammas commented on SPARK-19216: -- cc [~josephkb] - Is this a valid gap in Python's API

[jira] [Created] (SPARK-19216) LogisticRegressionModel is missing getThreshold()

2017-01-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19216: Summary: LogisticRegressionModel is missing getThreshold() Key: SPARK-19216 URL: https://issues.apache.org/jira/browse/SPARK-19216 Project: Spark

[jira] [Created] (SPARK-19106) Styling for the configuration docs is broken

2017-01-06 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-19106: Summary: Styling for the configuration docs is broken Key: SPARK-19106 URL: https://issues.apache.org/jira/browse/SPARK-19106 Project: Spark Issue

[jira] [Updated] (SPARK-19106) Styling for the configuration docs is broken

2017-01-06 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-19106: - Attachment: Screen Shot 2017-01-06 at 10.20.52 AM.png > Styling for the configurat

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2017-01-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15796227#comment-15796227 ] Nicholas Chammas commented on SPARK-18866: -- Could be. I guess the issue of aliasing somehow

[jira] [Commented] (SPARK-16402) JDBC source: Implement save API

2016-12-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785990#comment-15785990 ] Nicholas Chammas commented on SPARK-16402: -- [~JustinPihony], [~smilegator] - Does the resolution

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-12-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15762706#comment-15762706 ] Nicholas Chammas commented on SPARK-18492: -- Yup, I'm seeming the same high-level behavior as you

Re: [VOTE] Apache Spark 2.1.0 (RC5)

2016-12-19 Thread Nicholas Chammas
Since it’s not a regression from 2.0 (I believe the same issue affects both 2.0 and 2.1) it doesn’t merit a -1 vote according to the voting guidelines. Of course, it would be nice if we could fix the various optimizer issues that all seem to have a workaround that involves persist() (another one

[jira] [Commented] (SPARK-18492) GeneratedIterator grows beyond 64 KB

2016-12-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755742#comment-15755742 ] Nicholas Chammas commented on SPARK-18492: -- I'm hitting this problem as well when I try to apply

[jira] [Updated] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18866: - Description: Here's a minimal repro: {code} import pyspark from pyspark.sql import

[jira] [Updated] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18866: - Description: Here's a minimal repro: {code} import pyspark from pyspark.sql import

[jira] [Commented] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15749815#comment-15749815 ] Nicholas Chammas commented on SPARK-18866: -- cc [~hvanhovell] > Codegen fails with cryptic er

[jira] [Created] (SPARK-18866) Codegen fails with cryptic error if regexp_replace() output column is not aliased

2016-12-14 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18866: Summary: Codegen fails with cryptic error if regexp_replace() output column is not aliased Key: SPARK-18866 URL: https://issues.apache.org/jira/browse/SPARK-18866

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-12-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740419#comment-15740419 ] Nicholas Chammas commented on SPARK-13587: -- Thanks to a lot of help from [~quasi...@gmail.com

[jira] [Created] (SPARK-18818) Window...orderBy() should accept an 'ascending' parameter just like DataFrame.orderBy()

2016-12-10 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18818: Summary: Window...orderBy() should accept an 'ascending' parameter just like DataFrame.orderBy() Key: SPARK-18818 URL: https://issues.apache.org/jira/browse/SPARK-18818

[jira] [Commented] (SPARK-14932) Allow DataFrame.replace() to replace values with None

2016-12-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15735525#comment-15735525 ] Nicholas Chammas commented on SPARK-14932: -- My goal is to be able to do something like

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
i.c...@rbc.com> wrote: > I’m pretty sure I didn’t. > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:56 AM > *To:* Chen, Yan I; Di Zhu > > > *Cc:* user @spark > *Subject:* Re: unsubscribe > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
bed, > but I still received this email. > > > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:02 AM > *To:* Di Zhu > *Cc:* user @spark > *Subject:* Re: unsubscribe > > > > Yes, sorry about

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
Yes, sorry about that. I didn't think before responding to all those who asked to unsubscribe. On Thu, Dec 8, 2016 at 10:00 AM Di Zhu <jason4zhu.bigd...@gmail.com> wrote: > Could you send to individual privately without cc to all users every time? > > > On 8 Dec 2016, at

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:46 AM Ramon Rosa da Silva wrote: > > This e-mail message, including any attachments, is for the sole use of

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:46 AM Tao Lu wrote: > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 8:01 AM Niki Pavlopoulou wrote: > unsubscribe >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:50 AM Juan Caravaca wrote: > unsubscribe >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:54 AM Kishorkumar Patil wrote: > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:42 AM Chen, Yan I wrote: > > > > ___ > > If you

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:17 AM Prashant Singh Thakur < prashant.tha...@impetus.co.in> wrote: > > > > > Best Regards, > > Prashant Thakur > > Work : 6046 > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:08 AM Kranthi Gmail wrote: > > > -- > Kranthi > > PS: Sent from mobile, pls excuse the brevity and typos. > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 6:27 AM Vinicius Barreto < vinicius.s.barr...@gmail.com> wrote: > Unsubscribe > > Em 7 de dez de 2016 17:46, "map reduced"

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:54 AM Roger Holenweger wrote: > > > - > To

Re: unscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 1:34 AM smith_666 wrote: > > > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:12 AM Ajit Jaokar wrote: > > > - > To

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Wed, Dec 7, 2016 at 10:53 PM Ajith Jose wrote: > >

Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-07 Thread Nicholas Chammas
ub.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L156 > > Regards, > Kazuaki Ishizaki > > > > From:Reynold Xin <r...@databricks.com> > To:Nicholas Chammas <nicholas.cham...@gma

Reduce memory usage of UnsafeInMemorySorter

2016-12-06 Thread Nicholas Chammas
this, and the question is about internals like UnsafeInMemorySorter, I hope this is OK here. Nick On Mon, Dec 5, 2016 at 9:11 AM Nicholas Chammas nicholas.cham...@gmail.com <http://mailto:nicholas.cham...@gmail.com> wrote: I was testing out a new project at scale on Spark 2.0.2 running on YARN, &g

Re: Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
You mean just for branch-2.0, right? ​ On Mon, Dec 5, 2016 at 8:35 PM Shixiong(Ryan) Zhu <shixi...@databricks.com> wrote: > Hey Nick, > > It should be safe to upgrade Netty to the latest 4.0.x version. Could you > submit a PR, please? > > On Mon, Dec 5, 2016 at 11

Re: Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
y/util/internal/ThreadLocalRandom.class > > On Mon, Dec 5, 2016 at 8:53 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > > I’m looking at the list of dependencies here: > > > https://github.com/apache/spark/search?l=Groff=netty=Code=%E2%9C%93 > &

Difference between netty and netty-all

2016-12-05 Thread Nicholas Chammas
I’m looking at the list of dependencies here: https://github.com/apache/spark/search?l=Groff=netty=Code=%E2%9C%93 What’s the difference between netty and netty-all? The reason I ask is because I’m looking at a Netty PR and trying to figure out if Spark

[jira] [Created] (SPARK-18719) Document spark.ui.showConsoleProgress

2016-12-05 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18719: Summary: Document spark.ui.showConsoleProgress Key: SPARK-18719 URL: https://issues.apache.org/jira/browse/SPARK-18719 Project: Spark Issue Type

java.lang.IllegalStateException: There is no space for new record

2016-12-05 Thread Nicholas Chammas
I was testing out a new project at scale on Spark 2.0.2 running on YARN, and my job failed with an interesting error message: TaskSetManager: Lost task 37.3 in stage 31.0 (TID 10684, server.host.name): java.lang.IllegalStateException: There is no space for new record 05:27:09.573 at

Re: Future of the Python 2 support.

2016-12-04 Thread Nicholas Chammas
I don't think it makes sense to deprecate or drop support for Python 2.7 until at least 2020, when 2.7 itself will be EOLed. (As of Spark 2.0, Python 2.6 support is deprecated and will be removed by Spark 2.2. Python 2.7 is only version of Python 2 that's still fully supported.) Given the

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-12-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713636#comment-15713636 ] Nicholas Chammas commented on SPARK-13587: -- [~tsp]: {quote} Previously, I have had reasonable

Re: [VOTE] Apache Spark 2.1.0 (RC1)

2016-11-30 Thread Nicholas Chammas
> -1 (non binding) https://issues.apache.org/jira/browse/SPARK-16589 No matter how useless in practice this shouldn't go to another major release. I agree that that issue is a major one since it relates to correctness, but since it's not a regression it technically does not merit a -1 vote on the

[jira] [Updated] (SPARK-16589) Chained cartesian produces incorrect number of records

2016-11-30 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-16589: - Labels: correctness (was: ) > Chained cartesian produces incorrect number of reco

[jira] [Commented] (SPARK-18589) persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child"

2016-11-25 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15696717#comment-15696717 ] Nicholas Chammas commented on SPARK-18589: -- cc [~davies] [~hvanhovell] > persist() resol

[jira] [Created] (SPARK-18589) persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child"

2016-11-25 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18589: Summary: persist() resolves "java.lang.RuntimeException: Invalid PythonUDF (...), requires attributes from more than one child" Key: SPARK-18589

Re: Memory leak warnings in Spark 2.0.1

2016-11-23 Thread Nicholas Chammas
 Thanks for the reference and PR. On Wed, Nov 23, 2016 at 2:59 AM Reynold Xin <r...@databricks.com> wrote: > See https://issues.apache.org/jira/browse/SPARK-18557 > <https://issues.apache.org/jira/browse/SPARK-18557> > > On Mon, Nov 21, 2016 at 1:16 PM, Nicholas C

Re: Memory leak warnings in Spark 2.0.1

2016-11-21 Thread Nicholas Chammas
I'm also curious about this. Is there something we can do to help troubleshoot these leaks and file useful bug reports? On Wed, Oct 12, 2016 at 4:33 PM vonnagy wrote: > I am getting excessive memory leak warnings when running multiple mapping > and > aggregations and using

Re: Green dot in web UI DAG visualization

2016-11-17 Thread Nicholas Chammas
https://issues.apache.org/jira/browse/SPARK-18495 On Thu, Nov 17, 2016 at 12:23 PM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > Nice catch Suhas, and thanks for the reference. Sounds like we need a > tweak to the UI so this little feature is self-documenting. > >

[jira] [Commented] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674980#comment-15674980 ] Nicholas Chammas commented on SPARK-18495: -- cc [~andrewor14] > Web UI should document mean

[jira] [Created] (SPARK-18495) Web UI should document meaning of green dot in DAG visualization

2016-11-17 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18495: Summary: Web UI should document meaning of green dot in DAG visualization Key: SPARK-18495 URL: https://issues.apache.org/jira/browse/SPARK-18495 Project

Re: Green dot in web UI DAG visualization

2016-11-17 Thread Nicholas Chammas
ry instead of from HDFS." > > from > https://databricks.com/blog/2015/06/22/understanding-your-spark-application-through-visualization.html > > On Thu, Nov 17, 2016 at 9:19 AM, Reynold Xin <r...@databricks.com> wrote: > > Ha funny. Never noticed that. > > > On

Re: Green dot in web UI DAG visualization

2016-11-17 Thread Nicholas Chammas
Hmm... somehow the image didn't show up. How about now? [image: Screen Shot 2016-11-17 at 11.57.14 AM.png] On Thu, Nov 17, 2016 at 12:14 PM Herman van Hövell tot Westerflier < hvanhov...@databricks.com> wrote: Should I be able to see something? On Thu, Nov 17, 2016 at 9:10 AM, Ni

Green dot in web UI DAG visualization

2016-11-17 Thread Nicholas Chammas
Some questions about this DAG visualization: [image: Screen Shot 2016-11-17 at 11.57.14 AM.png] 1. What's the meaning of the green dot? 2. Should this be documented anywhere (if it isn't already)? Preferably a tooltip or something directly in the UI would explain the significance. Nick

Re: [VOTE] Release Apache Spark 2.0.2 (RC3)

2016-11-14 Thread Nicholas Chammas
Has the release already been made? I didn't see any announcement, but Homebrew has already updated to 2.0.2. 2016년 11월 11일 (금) 오후 2:59, Reynold Xin 님이 작성: > The vote has passed with the following +1s and no -1. I will work on > packaging the release. > > +1: > > Reynold Xin*

Re: Strongly Connected Components

2016-11-13 Thread Nicholas Chammas
FYI: There is a new connected components implementation coming in GraphFrames 0.3. See: https://github.com/graphframes/graphframes/pull/119 Implementation is based on: https://mmds-data.org/presentations/2014/vassilvitskii_mmds14.pdf Nick On Sat, Nov 12, 2016 at 3:01 PM Koert Kuipers

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15658284#comment-15658284 ] Nicholas Chammas commented on SPARK-18367: -- Looks like lowering {{bypassMergeThreshold}} even

[jira] [Commented] (SPARK-18084) write.partitionBy() does not recognize nested columns that select() can access

2016-11-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15657285#comment-15657285 ] Nicholas Chammas commented on SPARK-18084: -- It sounds like from Michael's comment

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656254#comment-15656254 ] Nicholas Chammas commented on SPARK-18367: -- Ah, sounds like the correct explanation to me. So

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656230#comment-15656230 ] Nicholas Chammas commented on SPARK-18367: -- How are you monitoring the number of open files? I

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656228#comment-15656228 ] Nicholas Chammas commented on SPARK-18367: -- Tomorrow I'll try running this on a Linux VM. Maybe

[jira] [Comment Edited] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656199#comment-15656199 ] Nicholas Chammas edited comment on SPARK-18367 at 11/11/16 5:30 AM

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656199#comment-15656199 ] Nicholas Chammas commented on SPARK-18367: -- Tweaked repro script to show partitions before

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656066#comment-15656066 ] Nicholas Chammas commented on SPARK-18367: -- I noticed that if I generate a DataFrame with fewer

[jira] [Comment Edited] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656019#comment-15656019 ] Nicholas Chammas edited comment on SPARK-18367 at 11/11/16 3:44 AM

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: spark-lsof.txt Here is the output of {{lsof}} on all the pids owned by Spark

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Commented] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655091#comment-15655091 ] Nicholas Chammas commented on SPARK-18367: -- I've updated the issue description with a minimal

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a moderately complex DataFrame query that spawns north of 10K open

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: (was: plan-with-limit.txt) > DataFrame join spawns unreasonably h

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: (was: plan-without-limit.txt) > DataFrame join spawns unreasonably h

[jira] [Updated] (SPARK-18367) DataFrame join spawns unreasonably high number of open files

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Summary: DataFrame join spawns unreasonably high number of open files (was: limit

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15655032#comment-15655032 ] Nicholas Chammas commented on SPARK-18367: -- Scratch that. This is not related to UDFs. > li

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15654770#comment-15654770 ] Nicholas Chammas commented on SPARK-18367: -- Looks like this is a fundamental problem with Python

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652968#comment-15652968 ] Nicholas Chammas commented on SPARK-18367: -- Even if I cut the number of records I'm processing

[jira] [Comment Edited] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652406#comment-15652406 ] Nicholas Chammas edited comment on SPARK-18367 at 11/10/16 3:24 AM

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-09 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15652881#comment-15652881 ] Nicholas Chammas commented on SPARK-18367: -- To provide some context, this code base I'm

<    1   2   3   4   5   6   7   8   9   10   >