Re: A proposal for Spark 2.0

2015-11-12 Thread Nicholas Chammas
With regards to Machine learning, it would be great to move useful features from MLlib to ML and deprecate the former. Current structure of two separate machine learning packages seems to be somewhat confusing. With regards to GraphX, it would be great to deprecate the use of RDD in GraphX and

Re: A proposal for Spark 2.0

2015-11-10 Thread Nicholas Chammas
> For this reason, I would *not* propose doing major releases to break substantial API's or perform large re-architecting that prevent users from upgrading. Spark has always had a culture of evolving architecture incrementally and making changes - and I don't think we want to change this model.

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
Thanks for sharing this, Christian. What build of Spark are you using? If I understand correctly, if you are using Spark built against Hadoop 2.6+ then additional configs alone won't help because additional libraries also need to be installed .

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
ly helps with this as well. > Without the instance-profile, we got it working by copying a > .aws/credentials file up to each node. We could easily automate that > through the templates. > > I don't need any additional libraries. We just need to change the > core-site.xml > > -C

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Nicholas Chammas
Yeah, as Shivaram mentioned, this issue is well-known. It's documented in SPARK-5189 and a bunch of related issues. Unfortunately, it's hard to resolve this issue in spark-ec2 without rewriting large parts of the project. But if you take a crack

Re: [VOTE] Release Apache Spark 1.5.2 (RC2)

2015-11-05 Thread Nicholas Chammas
-0 The spark-ec2 version is still set to 1.5.1 . Nick On Wed, Nov 4, 2015 at 8:20 PM Egor Pahomov wrote: > +1 > > Things, which our infrastructure use and I checked: > > Dynamic allocation > Spark

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
<https://issues.apache.org/jira/browse/SPARK-7442>. On Fri, Nov 6, 2015 at 12:22 AM Christian <engr...@gmail.com> wrote: > Even with the changes I mentioned above? > On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Yep, I t

Re: Recommended change to core-site.xml template

2015-11-05 Thread Nicholas Chammas
t; from: spark-1.5.1-bin-hadoop1 > > Are you saying there might be different behavior if I download > spark-1.5.1-hadoop-2.6 and create my cluster? > > On Thu, Nov 5, 2015 at 1:28 PM, Christian <engr...@gmail.com> wrote: > >> Spark 1.5.1-hadoop1 >> >> On

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
Nick ​ On Sun, Nov 1, 2015 at 5:32 PM Shivaram Venkataraman < shiva...@eecs.berkeley.edu> wrote: > On Sun, Nov 1, 2015 at 2:16 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > OK, I’ll focus on the Apache mirrors going forward. > > > > Th

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
d > just do something like > > wget > http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz=download > > Thanks > Shivaram > > On Sun, Nov 1, 2015 at 3:18 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > &

Re: Downloading Hadoop from s3://spark-related-packages/

2015-11-01 Thread Nicholas Chammas
; On Sun, Nov 1, 2015 at 2:30 AM, Steve Loughran <ste...@hortonworks.com> > wrote: > > > > On 1 Nov 2015, at 03:17, Nicholas Chammas <nicholas.cham...@gmail.com> > > wrote: > > > > https://s3.amazonaws.com/spark-related-packages/ > > > > spa

Downloading Hadoop from s3://spark-related-packages/

2015-10-31 Thread Nicholas Chammas
https://s3.amazonaws.com/spark-related-packages/ spark-ec2 uses this bucket to download and install HDFS on clusters. Is it owned by the Spark project or by the AMPLab? Anyway, it looks like the latest Hadoop install available on there is Hadoop 2.4.0. Are there plans to add newer versions of

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Nicholas Chammas
Nabble is an unofficial archive of this mailing list. I don't know who runs it, but it's not Apache. There are often delays between when things get posted to the list and updated on Nabble, and sometimes things never make it over for whatever reason. This mailing list is, I agree, very 1980s.

[jira] [Commented] (SPARK-3342) m3 instances don't get local SSDs

2015-10-26 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14974660#comment-14974660 ] Nicholas Chammas commented on SPARK-3342: - FWIW, that statement on M3 instances is [no longer

[jira] [Commented] (SPARK-10002) SSH problem during Setup of Spark(1.3.0) cluster on EC2

2015-10-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14969814#comment-14969814 ] Nicholas Chammas commented on SPARK-10002: -- [~deepalib] - Is {{--private-ips}} the solution

Can we add an unsubscribe link in the footer of every email?

2015-10-21 Thread Nicholas Chammas
Every week or so someone emails the list asking to unsubscribe. Of course, that's not the right way to do it. You're supposed to email a different address than this one to unsubscribe, yet this is not in-your-face obvious, so many people miss it. And

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Nicholas Chammas
t-master.sh -h xxx.xxx.xxx.xxx > > and then use the IP when you start the slaves: > > sbin/start-slave.sh spark://xxx.xxx.xxx.xxx.7077 > > ? > > Regards > JB > > On 10/16/2015 06:01 PM, Nicholas Chammas wrote: > > I'd look into tracing a possible bug here, but I'm no

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Nicholas Chammas
/28162991/cant-run-spark-1-2-in-standalone-mode-on-mac > http://stackoverflow.com/questions/29412157/passing-hostname-to-netty > > FYI > > On Wed, Oct 14, 2015 at 7:10 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I’m setting the Spark maste

Re: SPARK_MASTER_IP actually expects a DNS name, not IP address

2015-10-16 Thread Nicholas Chammas
Nick ​ On Fri, Oct 16, 2015 at 12:05 PM Sean Owen <so...@cloudera.com> wrote: > It's used in scripts like sbin/start-master.sh > > On Fri, Oct 16, 2015 at 5:01 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> I'd look into tracing a possible

Re: stability of Spark 1.4.1 with Python 3 versions

2015-10-14 Thread Nicholas Chammas
The Spark 1.4 release notes say that Python 3 is supported. The 1.4 docs are incorrect, and the 1.5 programming guide has been updated to indicate Python 3 support. On Wed, Oct 14, 2015 at 7:06 AM shoira.mukhsin...@bnpparibasfortis.com

Re: [ANNOUNCE] Announcing Spark 1.5.1

2015-10-11 Thread Nicholas Chammas
You can find the source tagged for release on GitHub , as was clearly linked to in the thread to vote on the release (titled "[VOTE] Release Apache Spark 1.5.1 (RC1)"). Is there something about that thread that was unclear? Nick On Sun, Oct

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-07 Thread Nicholas Chammas
now until something changes. If it changes, then those projects > might need to build Spark on their own and host older hadoop versions, etc. > > On Wed, Oct 7, 2015 at 9:59 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks guys. >> >> Regarding

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-07 Thread Nicholas Chammas
, Oct 5, 2015 at 2:41 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Thanks for looking into this Josh. >> >> On Mon, Oct 5, 2015 at 5:39 PM Josh Rosen <joshro...@databricks.com> >> wrote: >> >>> I'm working on a fix for th

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Nicholas Chammas
on't upload new artifacts with different SHAs for the > builds which *did* succeed). > > I expect to have this finished in the next day or so; I'm currently > blocked by some infra downtime but expect that to be resolved soon. > > - Josh > > On Mon, Oct 5, 2015 at 8:46

Re: Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-05 Thread Nicholas Chammas
reaks spark-ec2 script. > > On Mon, Oct 5, 2015 at 5:20 AM, Ted Yu <yuzhih...@gmail.com> wrote: > >> hadoop1 package for Scala 2.10 wasn't in RC1 either: >> http://people.apache.org/~pwendell/spark-releases/spark-1.5.1-rc1-bin/ >> >> On Sun, Oct 4, 2015 at 5:1

Spark 1.5.1 - Scala 2.10 - Hadoop 1 package is missing from S3

2015-10-04 Thread Nicholas Chammas
I’m looking here: https://s3.amazonaws.com/spark-related-packages/ I believe this is where one set of official packages is published. Please correct me if this is not the case. It appears that almost every version of Spark up to and including 1.5.0 has included a --bin-hadoop1.tgz release (e.g.

[issue25284] Spec for BaseEventLoop.run_in_executor(executor, callback, *args) is outdated in documentation

2015-09-30 Thread Nicholas Chammas
Changes by Nicholas Chammas <nicholas.cham...@gmail.com>: -- nosy: +Nicholas Chammas ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

Re: How to get the HDFS path for each RDD

2015-09-27 Thread Nicholas Chammas
Shouldn't this discussion be held on the user list and not the dev list? The dev list (this list) is for discussing development on Spark itself. Please move the discussion accordingly. Nick 2015년 9월 27일 (일) 오후 10:57, Fengdong Yu 님이 작성: > Hi Anchit, > cat you create

[jira] [Commented] (SPARK-2622) Add Jenkins build numbers to SparkQA messages

2015-09-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14803169#comment-14803169 ] Nicholas Chammas commented on SPARK-2622: - [~mxm] - I noticed you have been posting this kind

[jira] [Commented] (SPARK-2622) Add Jenkins build numbers to SparkQA messages

2015-09-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14804559#comment-14804559 ] Nicholas Chammas commented on SPARK-2622: - No worries. Thanks for quickly finding and resolving

[jira] [Commented] (SPARK-4216) Eliminate duplicate Jenkins GitHub posts from AMPLab

2015-09-16 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791016#comment-14791016 ] Nicholas Chammas commented on SPARK-4216: - Thanks Josh! > Eliminate duplicate Jenkins Git

[jira] [Commented] (SPARK-3369) Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator

2015-09-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735323#comment-14735323 ] Nicholas Chammas commented on SPARK-3369: - Sean said: {quote} I don't think there's a &quo

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-28 Thread Nicholas Chammas
/forms/erct2s6KRR As noted before, your results are anonymous and public. Thanks again for participating! I hope this has been useful to the community. Nick On Tue, Aug 25, 2015 at 1:31 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Final chance to fill out the survey! http://goo.gl

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-28 Thread Nicholas Chammas
/forms/erct2s6KRR As noted before, your results are anonymous and public. Thanks again for participating! I hope this has been useful to the community. Nick On Tue, Aug 25, 2015 at 1:31 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Final chance to fill out the survey! http://goo.gl

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-25 Thread Nicholas Chammas
Final chance to fill out the survey! http://goo.gl/forms/erct2s6KRR I'm gonna close it to new responses tonight and send out a summary of the results. Nick On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm planning to close the survey to further responses

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-25 Thread Nicholas Chammas
Final chance to fill out the survey! http://goo.gl/forms/erct2s6KRR I'm gonna close it to new responses tonight and send out a summary of the results. Nick On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm planning to close the survey to further responses

[jira] [Commented] (SPARK-10191) spark-ec2 cannot stop running cluster

2015-08-24 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710093#comment-14710093 ] Nicholas Chammas commented on SPARK-10191: -- Can you fill in the description here

[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2015-08-20 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705446#comment-14705446 ] Nicholas Chammas commented on SPARK-3533: - {quote} Nicholas Chammas Have you been

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-20 Thread Nicholas Chammas
, Aug 17, 2015 at 11:09 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-20 Thread Nicholas Chammas
, Aug 17, 2015 at 11:09 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public

[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2015-08-20 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705182#comment-14705182 ] Nicholas Chammas commented on SPARK-3533: - No need to open a separate ticket

[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2015-08-17 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699613#comment-14699613 ] Nicholas Chammas commented on SPARK-3533: - [~silasdavis] - If you already have

[survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-17 Thread Nicholas Chammas
Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public. If the embedded form below doesn’t work for you, you can use this link to get the

[survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-17 Thread Nicholas Chammas
Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public. If the embedded form below doesn’t work for you, you can use this link to get the

Re: Writing to multiple outputs in Spark

2015-08-14 Thread Nicholas Chammas
See: https://issues.apache.org/jira/browse/SPARK-3533 Feel free to comment there and make a case if you think the issue should be reopened. Nick On Fri, Aug 14, 2015 at 11:11 AM Abhishek R. Singh abhis...@tetrationanalytics.com wrote: A workaround would be to have multiple passes on the RDD

Re: Unsubscribe

2015-08-03 Thread Nicholas Chammas
The way to do that is to follow the Unsubscribe link here for dev@spark: http://spark.apache.org/community.html We can't drop you. You have to do it yourself. Nick On Mon, Aug 3, 2015 at 1:54 PM Trevor Grant trevor.d.gr...@gmail.com wrote: Please drop me from this list Trevor Grant Data

Re: Should spark-ec2 get its own repo?

2015-08-02 Thread Nicholas Chammas
On Sat, Aug 1, 2015 at 1:09 PM Matt Goodman meawo...@gmail.com wrote: I am considering porting some of this to a more general spark-cloud launcher, including google/aliyun/rackspace. It shouldn't be hard at all given the current approach for setup/install. FWIW, there are already some tools

Re: spark spark-ec2 credentials using aws_security_token

2015-07-27 Thread Nicholas Chammas
You refer to `aws_security_token`, but I'm not sure where you're specifying it. Can you elaborate? Is it an environment variable? On Mon, Jul 27, 2015 at 4:21 AM Jan Zikeš jan.zi...@centrum.cz wrote: Hi, I would like to ask if it is currently possible to use spark-ec2 script together with

Re: Should spark-ec2 get its own repo?

2015-07-13 Thread Nicholas Chammas
At a high level I see the spark-ec2 scripts as an effort to provide a reference implementation for launching EC2 clusters with Apache Spark On a side note, this is precisely how I used spark-ec2 for a personal project that does something similar: reference implementation. Nick 2015년 7월 13일 (월)

[jira] [Commented] (SPARK-8960) Style cleanup of spark_ec2.py

2015-07-10 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14622542#comment-14622542 ] Nicholas Chammas commented on SPARK-8960: - Style cleanup is OK, but should

Re: spark ec2 as non-root / any plan to improve that in the future ?

2015-07-09 Thread Nicholas Chammas
No plans to change that at the moment, but agreed it is against accepted convention. It would be a lot of work to change the tool, change the AMIs, and test everything. My suggestion is not to hold your breath for such a change. spark-ec2, as far as I understand, is not intended for spinning up

Should spark-ec2 get its own repo?

2015-07-03 Thread Nicholas Chammas
spark-ec2 is kind of a mini project within a project. It’s composed of a set of EC2 AMIs https://github.com/mesos/spark-ec2/tree/branch-1.4/ami-list under someone’s account (maybe Patrick’s?) plus the following 2 code bases: - Main command line tool:

[jira] [Commented] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14605822#comment-14605822 ] Nicholas Chammas commented on SPARK-8670: - Not sure. Does Scala offer the same

[jira] [Commented] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606178#comment-14606178 ] Nicholas Chammas commented on SPARK-8670: - FYI: `df.stats.age` works neither

[jira] [Comment Edited] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606328#comment-14606328 ] Nicholas Chammas edited comment on SPARK-8670 at 6/29/15 9:01 PM

[jira] [Resolved] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-8670. - Resolution: Invalid Nested columns can't be referenced (but they can be selected

[jira] [Commented] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-29 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606328#comment-14606328 ] Nicholas Chammas commented on SPARK-8670: - After a discussion with [~davies

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs 1.4? Nick 2015년 6월 27일 (토) 오전 2:51, Axel Dahl a...@whisperstream.com님이 작성: still feels like a bug to have to create unique names before a join. On Fri, Jun 26, 2015 at 9:51 PM, ayan

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
: I've only tested on 1.4, but imagine 1.3 is the same or a lot of people's code would be failing right now. On Saturday, June 27, 2015, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs

[jira] [Created] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-26 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-8670: --- Summary: Nested columns can't be referenced (but they can be selected) Key: SPARK-8670 URL: https://issues.apache.org/jira/browse/SPARK-8670 Project: Spark

[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-26 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-8670: Description: This is strange and looks like a regression from 1.3. {code} import json

[jira] [Commented] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-26 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603572#comment-14603572 ] Nicholas Chammas commented on SPARK-8670: - cc [~rxin], [~davies] Nested columns

[jira] [Commented] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-26 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14603658#comment-14603658 ] Nicholas Chammas commented on SPARK-8670: - I thought, per the discussion on [SPARK

[jira] [Resolved] (SPARK-6220) Allow extended EC2 options to be passed through spark-ec2

2015-06-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-6220. - Resolution: Won't Fix Resolving this issue as won't fix since it is of low importance

[jira] [Updated] (SPARK-8576) Add spark-ec2 options to assign launched instances into IAM roles and to set instance-initiated shutdown behavior

2015-06-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-8576: Summary: Add spark-ec2 options to assign launched instances into IAM roles and to set

[jira] [Created] (SPARK-8576) Add spark-ec2 options to assigned launched instances into IAM roles and to set instance-initiated shutdown behavior

2015-06-23 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-8576: --- Summary: Add spark-ec2 options to assigned launched instances into IAM roles and to set instance-initiated shutdown behavior Key: SPARK-8576 URL: https://issues.apache.org

Re: Stats on targets for 1.5.0

2015-06-19 Thread Nicholas Chammas
I think it would be fantastic if this work was burned down before adding big new chunks of work. The stat is worth keeping an eye on. +1, keeping in mind that burning down work also means just targeting it for a different release or closing it. :) Nick On Fri, Jun 19, 2015 at 3:18 PM Sean

[jira] [Commented] (SPARK-8417) spark-class has illegal statement

2015-06-18 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591988#comment-14591988 ] Nicholas Chammas commented on SPARK-8417: - I'm not sure what I'm looking at. Can

[jira] [Commented] (SPARK-8429) Add ability to set additional tags

2015-06-18 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14592010#comment-14592010 ] Nicholas Chammas commented on SPARK-8429: - What is your use case for this feature

Re: Sidebar: issues targeted for 1.4.0

2015-06-18 Thread Nicholas Chammas
Given fixed time, adding more TODOs generally means other stuff has to be taken out for the release. If not, then it happens de facto anyway, which is worse than managing it on purpose. +1 to this. I wouldn't mind helping go through open issues on JIRA targeted for the next release around RC

[jira] [Commented] (SPARK-6220) Allow extended EC2 options to be passed through spark-ec2

2015-06-15 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14586334#comment-14586334 ] Nicholas Chammas commented on SPARK-6220: - please forgive my greenness No need

Re: Remove Hadoop 1 support (Hadoop 2.2) for Spark 1.5?

2015-06-12 Thread Nicholas Chammas
I'm personally in favor, but I don't have a sense of how many people still rely on Hadoop 1. Nick 2015년 6월 12일 (금) 오전 9:13, Steve Loughran ste...@hortonworks.com님이 작성: +1 for 2.2+ Not only are the APis in Hadoop 2 better, there's more people testing Hadoop 2.x spark, and bugs in Hadoop

[jira] [Created] (SPARK-8316) Upgrade Maven to 3.3.3

2015-06-11 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-8316: --- Summary: Upgrade Maven to 3.3.3 Key: SPARK-8316 URL: https://issues.apache.org/jira/browse/SPARK-8316 Project: Spark Issue Type: Improvement

Re: Did the 3.4.4 docs get published early?

2015-06-11 Thread Nicholas Chammas
. Nick On Wed, Jun 10, 2015 at 2:25 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Also, just replacing the version number in the URL works for the python 3 series (use 3.X even for python 3.0), even farther back than the drop down menu allows. This does not help in this case: https

Did the 3.4.4 docs get published early?

2015-06-10 Thread Nicholas Chammas
For example, here is a New in version 3.4.4 method: https://docs.python.org/3/library/asyncio-task.html#asyncio.ensure_future However, the latest release appears to be 3.4.3: https://www.python.org/downloads/ Is this normal, or did the 3.4.4 docs somehow get published early by mistake? Nick

Re: Did the 3.4.4 docs get published early?

2015-06-10 Thread Nicholas Chammas
(like the one I linked to) are introduced in maintenance versions, it’s probably hard to separate them out into separate branches. Nick ​ On Wed, Jun 10, 2015 at 10:11 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: For example, here is a New in version 3.4.4 method: https

Re: Required settings for permanent HDFS Spark on EC2

2015-06-05 Thread Nicholas Chammas
If your problem is that stopping/starting the cluster resets configs, then you may be running into this issue: https://issues.apache.org/jira/browse/SPARK-4977 Nick On Thu, Jun 4, 2015 at 2:46 PM barmaley o...@solver.com wrote: Hi - I'm having similar problem with switching from ephemeral to

[jira] [Commented] (SPARK-5398) Support the eu-central-1 region for spark-ec2

2015-06-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573138#comment-14573138 ] Nicholas Chammas commented on SPARK-5398: - I don't have the credentials to do

[jira] [Commented] (SPARK-5398) Support the eu-central-1 region for spark-ec2

2015-06-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14573140#comment-14573140 ] Nicholas Chammas commented on SPARK-5398: - I don't have the credentials to do

[jira] [Issue Comment Deleted] (SPARK-5398) Support the eu-central-1 region for spark-ec2

2015-06-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5398: Comment: was deleted (was: I don't have the credentials to do that, unfortunately. Maybe

[jira] [Commented] (SPARK-7900) Reduce number of tagging calls in spark-ec2

2015-06-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571460#comment-14571460 ] Nicholas Chammas commented on SPARK-7900: - I'm marking this as a duplicate

[jira] [Resolved] (SPARK-7900) Reduce number of tagging calls in spark-ec2

2015-06-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas resolved SPARK-7900. - Resolution: Duplicate Reduce number of tagging calls in spark-ec2

[jira] [Commented] (SPARK-4983) Add sleep() before tagging EC2 instances to allow instance metadata to propagate

2015-06-03 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571467#comment-14571467 ] Nicholas Chammas commented on SPARK-4983: - Per the discussion on [SPARK-7900], I

[jira] [Updated] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-05-31 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5189: Description: As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first

[jira] [Updated] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2015-05-31 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-5189: Description: As of 1.2.0, we launch Spark clusters on EC2 by setting up the master first

[jira] [Commented] (SPARK-7900) Reduce number of tagging calls in spark-ec2

2015-05-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563208#comment-14563208 ] Nicholas Chammas commented on SPARK-7900: - The name tags are optional, but we can

[jira] [Commented] (SPARK-7900) Reduce number of tagging calls in spark-ec2

2015-05-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14563053#comment-14563053 ] Nicholas Chammas commented on SPARK-7900: - An alternative approach would

[jira] [Created] (SPARK-7900) Reduce number of tagging calls in spark-ec2

2015-05-27 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-7900: --- Summary: Reduce number of tagging calls in spark-ec2 Key: SPARK-7900 URL: https://issues.apache.org/jira/browse/SPARK-7900 Project: Spark Issue Type

[jira] [Commented] (SPARK-7505) Update PySpark DataFrame docs: encourage __getitem__, mark as experimental, etc.

2015-05-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556860#comment-14556860 ] Nicholas Chammas commented on SPARK-7505: - cc [~davies] - I think the most

[jira] [Commented] (SPARK-7507) pyspark.sql.types.StructType and Row should implement __iter__()

2015-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555482#comment-14555482 ] Nicholas Chammas commented on SPARK-7507: - Since {{Row}} seems most analogous

[jira] [Commented] (SPARK-7507) pyspark.sql.types.StructType and Row should implement __iter__()

2015-05-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554550#comment-14554550 ] Nicholas Chammas commented on SPARK-7507: - Related: A Stack Overflow question

Re: Wish for 1.4: upper bound on # tasks in Mesos

2015-05-20 Thread Nicholas Chammas
To put this on the devs' radar, I suggest creating a JIRA for it (and checking first if one already exists). issues.apache.org/jira/ Nick On Tue, May 19, 2015 at 1:34 PM Matei Zaharia matei.zaha...@gmail.com wrote: Yeah, this definitely seems useful there. There might also be some ways to

[jira] [Commented] (SPARK-7640) Private VPC with default Spark AMI breaks yum

2015-05-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551795#comment-14551795 ] Nicholas Chammas commented on SPARK-7640: - [~brdwrd] - According to [this doc

[jira] [Commented] (SPARK-7640) Private VPC with default Spark AMI breaks yum

2015-05-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544169#comment-14544169 ] Nicholas Chammas commented on SPARK-7640: - {quote} Switch everything to support

[jira] [Commented] (SPARK-7640) Private VPC with default Spark AMI breaks yum

2015-05-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544288#comment-14544288 ] Nicholas Chammas commented on SPARK-7640: - If there is no way around this (like

[jira] [Comment Edited] (SPARK-7606) Document all PySpark SQL/DataFrame public methods with @since tag

2015-05-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542701#comment-14542701 ] Nicholas Chammas edited comment on SPARK-7606 at 5/13/15 8:57 PM

[jira] [Commented] (SPARK-7606) Document all PySpark SQL/DataFrame public methods with @since tag

2015-05-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542701#comment-14542701 ] Nicholas Chammas commented on SPARK-7606: - Just looked into this. If we are using

[jira] [Created] (SPARK-7606) Document all PySpark SQL/DataFrame public methods with @since tag

2015-05-13 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-7606: --- Summary: Document all PySpark SQL/DataFrame public methods with @since tag Key: SPARK-7606 URL: https://issues.apache.org/jira/browse/SPARK-7606 Project: Spark

[jira] [Commented] (SPARK-7606) Document all PySpark SQL/DataFrame public methods with @since tag

2015-05-13 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542176#comment-14542176 ] Nicholas Chammas commented on SPARK-7606: - (I just cloned SPARK-7588.) Dunno what

Re: [PySpark DataFrame] When a Row is not a Row

2015-05-13 Thread Nicholas Chammas
the columns. Basically, the rows are just named tuples (called `Row`). -- Davies Liu Sent with Sparrow http://www.sparrowmailapp.com/?sig 已使用 Sparrow http://www.sparrowmailapp.com/?sig 在 2015年5月12日 星期二,上午4:49,Nicholas Chammas 写道: This is really strange. # Spark 1.3.1 print type(results

<    5   6   7   8   9   10   11   12   13   14   >