Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and

Re: SSH Tunneling issue with Apache Spark

2023-12-06 Thread Nicholas Chammas
nks for the advice Nicholas. > > As mentioned in the original email, I have tried JDBC + SSH Tunnel using > pymysql and sshtunnel and it worked fine. The problem happens only with Spark. > > Thanks, > Venkat > > > > On Wed, Dec 6, 2023 at 10:21 PM Nicholas Cha

Suppressing output from Apache Ivy (?) when calling spark-submit with --packages

2018-02-27 Thread Nicholas Chammas
I’m not sure whether this is something controllable via Spark, but when you call spark-submit with --packages you get a lot of output. Is there any way to suppress it? Does it come from Apache Ivy? I posted more details about what I’m seeing on Stack Overflow

Re: Trouble with PySpark UDFs and SPARK_HOME only on EMR

2017-06-22 Thread Nicholas Chammas
Here’s a repro for a very similar issue where Spark hangs on the UDF, which I think is related to the SPARK_HOME issue. I posted the repro on the EMR forum , but in case you can’t access it: 1. I’m running EMR 5.6.0, Spark 2.1.1, and

Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
rty to > set "spark.scheduler.pool" to something other than the default pool before > a particular Job intended to use that pool is started via that SparkContext. > > On Wed, Apr 5, 2017 at 1:11 PM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > > Hmm, so

Re: Spark fair scheduler pools vs. YARN queues

2017-04-05 Thread Nicholas Chammas
Hmm, so when I submit an application with `spark-submit`, I need to guarantee it resources using YARN queues and not Spark's scheduler pools. Is that correct? When are Spark's scheduler pools relevant/useful in this context? On Wed, Apr 5, 2017 at 3:54 PM Mark Hamstra

Re: New Amazon AMIs for EC2 script

2017-02-23 Thread Nicholas Chammas
spark-ec2 has moved to GitHub and is no longer part of the Spark project. A related issue from the current issue tracker that you may want to follow/comment on is this one: https://github.com/amplab/spark-ec2/issues/74 As I said there, I think requiring custom AMIs is one of the major maintenance

Re: Order of rows not preserved after cache + count + coalesce

2017-02-13 Thread Nicholas Chammas
RDDs and DataFrames do not guarantee any specific ordering of data. They are like tables in a SQL database. The only way to get a guaranteed ordering of rows is to explicitly specify an orderBy() clause in your statement. Any ordering you see otherwise is incidental. ​ On Mon, Feb 13, 2017 at

Re: Debugging a PythonException with no details

2017-01-17 Thread Nicholas Chammas
It seems it has to do with UDF..Could u share snippet of code you are > running? > Kr > > On 14 Jan 2017 1:40 am, "Nicholas Chammas" <nicholas.cham...@gmail.com> > wrote: > > I’m looking for tips on how to debug a PythonException that’s very sparse > on d

Debugging a PythonException with no details

2017-01-13 Thread Nicholas Chammas
I’m looking for tips on how to debug a PythonException that’s very sparse on details. The full exception is below, but the only interesting bits appear to be the following lines: org.apache.spark.api.python.PythonException: ... py4j.protocol.Py4JError: An error occurred while calling

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
i.c...@rbc.com> wrote: > I’m pretty sure I didn’t. > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:56 AM > *To:* Chen, Yan I; Di Zhu > > > *Cc:* user @spark > *Subject:* Re: unsubscribe > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
bed, > but I still received this email. > > > > > > *From:* Nicholas Chammas [mailto:nicholas.cham...@gmail.com] > *Sent:* Thursday, December 08, 2016 10:02 AM > *To:* Di Zhu > *Cc:* user @spark > *Subject:* Re: unsubscribe > > > > Yes, sorry about

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
Yes, sorry about that. I didn't think before responding to all those who asked to unsubscribe. On Thu, Dec 8, 2016 at 10:00 AM Di Zhu <jason4zhu.bigd...@gmail.com> wrote: > Could you send to individual privately without cc to all users every time? > > > On 8 Dec 2016, at

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:46 AM Ramon Rosa da Silva wrote: > > This e-mail message, including any attachments, is for the sole use of

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:46 AM Tao Lu wrote: > >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 8:01 AM Niki Pavlopoulou wrote: > unsubscribe >

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 7:50 AM Juan Caravaca wrote: > unsubscribe >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:54 AM Kishorkumar Patil wrote: > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 9:42 AM Chen, Yan I wrote: > > > > ___ > > If you

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:17 AM Prashant Singh Thakur < prashant.tha...@impetus.co.in> wrote: > > > > > Best Regards, > > Prashant Thakur > > Work : 6046 > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:08 AM Kranthi Gmail wrote: > > > -- > Kranthi > > PS: Sent from mobile, pls excuse the brevity and typos. > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 6:27 AM Vinicius Barreto < vinicius.s.barr...@gmail.com> wrote: > Unsubscribe > > Em 7 de dez de 2016 17:46, "map reduced"

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:54 AM Roger Holenweger wrote: > > > - > To

Re: unscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 1:34 AM smith_666 wrote: > > > >

Re: Unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Thu, Dec 8, 2016 at 12:12 AM Ajit Jaokar wrote: > > > - > To

Re: unsubscribe

2016-12-08 Thread Nicholas Chammas
To unsubscribe e-mail: user-unsubscr...@spark.apache.org This is explained here: http://spark.apache.org/community.html#mailing-lists On Wed, Dec 7, 2016 at 10:53 PM Ajith Jose wrote: > >

Re: Strongly Connected Components

2016-11-13 Thread Nicholas Chammas
FYI: There is a new connected components implementation coming in GraphFrames 0.3. See: https://github.com/graphframes/graphframes/pull/119 Implementation is based on: https://mmds-data.org/presentations/2014/vassilvitskii_mmds14.pdf Nick On Sat, Nov 12, 2016 at 3:01 PM Koert Kuipers

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
s.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any mo

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
age or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 2 September 201

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 5:14 PM abhishek singh wrote: > >

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 8:03 PM James Ding wrote: > >

Re: UNSUBSCRIBE

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Wed, Aug 10, 2016 at 2:46 AM Martin Somers wrote: > > > -- > M >

Re: Unsubscribe

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 3:02 PM Hogancamp, Aaron < aaron.t.hoganc...@leidos.com> wrote: > Unsubscribe. > > > > Thanks, > > > > Aaron Hogancamp > > Data Scientist > > >

Re: Unsubscribe.

2016-08-10 Thread Nicholas Chammas
Please follow the links here to unsubscribe: http://spark.apache.org/community.html On Tue, Aug 9, 2016 at 3:05 PM Martin Somers wrote: > Unsubscribe. > > Thanks > M >

Re: Add column sum as new column in PySpark dataframe

2016-08-05 Thread Nicholas Chammas
I think this is what you need: import pyspark.sql.functions as sqlf df.withColumn('total', sqlf.sum(df.columns)) Nic On Thu, Aug 4, 2016 at 9:41 AM Javier Rey jre...@gmail.com wrote: Hi everybody, > > Sorry, I sent last mesage it was imcomplete this is

Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
curious what to > use instead. > > On Aug 4, 2016, at 3:54 PM, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Have you looked at pyspark.sql.functions.udf and the associated examples? > 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen <bteeu...@gmail.com>님이 작성: > &

Re: registering udf to use in spark.sql('select...

2016-08-04 Thread Nicholas Chammas
Have you looked at pyspark.sql.functions.udf and the associated examples? 2016년 8월 4일 (목) 오전 9:10, Ben Teeuwen 님이 작성: > Hi, > > I’d like to use a UDF in pyspark 2.0. As in .. > > > def squareIt(x): > return x * x > > # register the function and define return type >

Re: spark-2.0 support for spark-ec2 ?

2016-07-27 Thread Nicholas Chammas
Yes, spark-ec2 has been removed from the main project, as called out in the Release Notes: http://spark.apache.org/releases/spark-release-2-0-0.html#removals You can still discuss spark-ec2 here or on Stack Overflow, as before. Bug reports and the like should now go on that AMPLab GitHub project

Re: Unsubscribe - 3rd time

2016-06-29 Thread Nicholas Chammas
> I'm not sure I've ever come across an email list that allows you to unsubscribe by responding to the list with "unsubscribe". Many noreply lists (e.g. companies sending marketing email) actually work that way, which is probably what most people are used to these days. What this list needs is

Re: Writing output of key-value Pair RDD

2016-05-04 Thread Nicholas Chammas
You're looking for this discussion: http://stackoverflow.com/q/23995040/877069 Also, a simpler alternative with DataFrames: https://github.com/apache/spark/pull/8375#issuecomment-202458325 On Wed, May 4, 2016 at 4:09 PM Afshartous, Nick wrote: > Hi, > > > Is there any

Re: spark-ec2 hitting yum install issues

2016-04-14 Thread Nicholas Chammas
If you log into the cluster and manually try that step does it still fail? Can you yum install anything else? You might want to report this issue directly on the spark-ec2 repo, btw: https://github.com/amplab/spark-ec2 Nick On Thu, Apr 14, 2016 at 9:08 PM sanusha

Re: Spark 1.6.1 packages on S3 corrupt?

2016-04-12 Thread Nicholas Chammas
Yes, this is a known issue. The core devs are already aware of it. [CC dev] FWIW, I believe the Spark 1.6.1 / Hadoop 2.6 package on S3 is not corrupt. It may be the only 1.6.1 package that is not corrupt, though. :/ Nick On Tue, Apr 12, 2016 at 9:00 PM Augustus Hong

Re: Reading Back a Cached RDD

2016-03-24 Thread Nicholas Chammas
Isn’t persist() only for reusing an RDD within an active application? Maybe checkpoint() is what you’re looking for instead? ​ On Thu, Mar 24, 2016 at 2:02 PM Afshartous, Nick wrote: > > Hi, > > > After calling RDD.persist(), is then possible to come back later and >

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
ich is the core problem anyway. > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > ---- Original message > From: Nicholas Chammas <nicholas.cham...@gmail.com> > Date: 03/02/2016 5:43 PM (GMT-05:00) > To: Darren Govoni <dar...@ont

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
aditional > RDD? > > For us almost all the processing comes before there is structure to it. > > > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > ---- Original message > From: Nicholas Chammas <nicholas.cham...@gmail.com> &g

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
> However, I believe, investing (or having some members of your group) learn and invest in Scala is worthwhile for few reasons. One, you will get the performance gain, especially now with Tungsten (not sure how it relates to Python, but some other knowledgeable people on the list, please chime

Re: Is this likely to cause any problems?

2016-02-19 Thread Nicholas Chammas
The docs mention spark-ec2 because it is part of the Spark project. There are many, many alternatives to spark-ec2 out there like EMR, but it's probably not the place of the official docs to promote any one of those third-party solutions. On Fri, Feb 19, 2016 at 11:05 AM James Hammerton

Re: Is spark-ec2 going away?

2016-01-27 Thread Nicholas Chammas
I noticed that in the main branch, the ec2 directory along with the spark-ec2 script is no longer present. It’s been moved out of the main repo to its own location: https://github.com/amplab/spark-ec2/pull/21 Is spark-ec2 going away in the next release? If so, what would be the best alternative

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
+1 Red Hat supports Python 2.6 on REHL 5 until 2020 , but otherwise yes, Python 2.6 is ancient history and the core Python developers stopped supporting it in 2013. REHL 5 is not a good enough reason to continue support for Python

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
om > > wrote: > >> I don't see a reason Spark 2.0 would need to support Python 2.6. At this >> point, Python 3 should be the default that is encouraged. >> Most organizations acknowledge the 2.7 is common, but lagging behind the >> version they should theoretically use.

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
I think all the slaves need the same (or a compatible) version of Python installed since they run Python code in PySpark jobs natively. On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote: > interesting i didnt know that! > > On Tue, Jan 5, 2016 at 5:57 PM, N

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
va 7 and python 2.6, no matter how outdated that is. >>> >>> i dont like it either, but i cannot change it. >>> >>> we currently don't use pyspark so i have no stake in this, but if we did >>> i can assure you we would not upgrade to spark 2.x if python 2.6 was >&g

Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Nicholas Chammas
Quick question: Are you processing gzipped files by any chance? It's a common stumbling block people hit. See: http://stackoverflow.com/q/27531816/877069 Nick On Fri, Dec 4, 2015 at 2:28 PM Kyohey Hamaguchi wrote: > Hi, > > I have setup a Spark standalone-cluster, which

Re: Adding more slaves to a running cluster

2015-11-25 Thread Nicholas Chammas
spark-ec2 does not directly support adding instances to an existing cluster, apart from the special case of adding slaves to a cluster with a master but no slaves. There is an open issue to track adding this support, SPARK-2008 , but it doesn't

Re: spark-ec2 script to launch cluster running Spark 1.5.2 built with HIVE?

2015-11-23 Thread Nicholas Chammas
Don't the Hadoop builds include Hive already? Like spark-1.5.2-bin-hadoop2.6.tgz? On Mon, Nov 23, 2015 at 7:49 PM Jeff Schecter wrote: > Hi all, > > As far as I can tell, the bundled spark-ec2 script provides no way to > launch a cluster running Spark 1.5.2 pre-built with

Re: Upgrading Spark in EC2 clusters

2015-11-12 Thread Nicholas Chammas
spark-ec2 does not offer a way to upgrade an existing cluster, and from what I gather, it wasn't intended to be used to manage long-lasting infrastructure. The recommended approach really is to just destroy your existing cluster and launch a new one with the desired configuration. If you want to

Re: Spark EC2 script on Large clusters

2015-11-05 Thread Nicholas Chammas
Yeah, as Shivaram mentioned, this issue is well-known. It's documented in SPARK-5189 and a bunch of related issues. Unfortunately, it's hard to resolve this issue in spark-ec2 without rewriting large parts of the project. But if you take a crack

Re: Sorry, but Nabble and ML suck

2015-10-31 Thread Nicholas Chammas
Nabble is an unofficial archive of this mailing list. I don't know who runs it, but it's not Apache. There are often delays between when things get posted to the list and updated on Nabble, and sometimes things never make it over for whatever reason. This mailing list is, I agree, very 1980s.

Can we add an unsubscribe link in the footer of every email?

2015-10-21 Thread Nicholas Chammas
Every week or so someone emails the list asking to unsubscribe. Of course, that's not the right way to do it. You're supposed to email a different address than this one to unsubscribe, yet this is not in-your-face obvious, so many people miss it. And

Re: stability of Spark 1.4.1 with Python 3 versions

2015-10-14 Thread Nicholas Chammas
The Spark 1.4 release notes say that Python 3 is supported. The 1.4 docs are incorrect, and the 1.5 programming guide has been updated to indicate Python 3 support. On Wed, Oct 14, 2015 at 7:06 AM shoira.mukhsin...@bnpparibasfortis.com

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-28 Thread Nicholas Chammas
/forms/erct2s6KRR As noted before, your results are anonymous and public. Thanks again for participating! I hope this has been useful to the community. Nick On Tue, Aug 25, 2015 at 1:31 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Final chance to fill out the survey! http://goo.gl

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-25 Thread Nicholas Chammas
Final chance to fill out the survey! http://goo.gl/forms/erct2s6KRR I'm gonna close it to new responses tonight and send out a summary of the results. Nick On Thu, Aug 20, 2015 at 2:08 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I'm planning to close the survey to further responses

Re: [survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-20 Thread Nicholas Chammas
, Aug 17, 2015 at 11:09 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public

[survey] [spark-ec2] What do you like/dislike about spark-ec2?

2015-08-17 Thread Nicholas Chammas
Howdy folks! I’m interested in hearing about what people think of spark-ec2 http://spark.apache.org/docs/latest/ec2-scripts.html outside of the formal JIRA process. Your answers will all be anonymous and public. If the embedded form below doesn’t work for you, you can use this link to get the

Re: spark spark-ec2 credentials using aws_security_token

2015-07-27 Thread Nicholas Chammas
You refer to `aws_security_token`, but I'm not sure where you're specifying it. Can you elaborate? Is it an environment variable? On Mon, Jul 27, 2015 at 4:21 AM Jan Zikeš jan.zi...@centrum.cz wrote: Hi, I would like to ask if it is currently possible to use spark-ec2 script together with

Re: spark ec2 as non-root / any plan to improve that in the future ?

2015-07-09 Thread Nicholas Chammas
No plans to change that at the moment, but agreed it is against accepted convention. It would be a lot of work to change the tool, change the AMIs, and test everything. My suggestion is not to hold your breath for such a change. spark-ec2, as far as I understand, is not intended for spinning up

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs 1.4? Nick 2015년 6월 27일 (토) 오전 2:51, Axel Dahl a...@whisperstream.com님이 작성: still feels like a bug to have to create unique names before a join. On Fri, Jun 26, 2015 at 9:51 PM, ayan

Re: dataframe left joins are not working as expected in pyspark

2015-06-27 Thread Nicholas Chammas
: I've only tested on 1.4, but imagine 1.3 is the same or a lot of people's code would be failing right now. On Saturday, June 27, 2015, Nicholas Chammas nicholas.cham...@gmail.com wrote: Yeah, you shouldn't have to rename the columns before joining them. Do you see the same behavior on 1.3 vs

Re: Required settings for permanent HDFS Spark on EC2

2015-06-05 Thread Nicholas Chammas
If your problem is that stopping/starting the cluster resets configs, then you may be running into this issue: https://issues.apache.org/jira/browse/SPARK-4977 Nick On Thu, Jun 4, 2015 at 2:46 PM barmaley o...@solver.com wrote: Hi - I'm having similar problem with switching from ephemeral to

Re: Wish for 1.4: upper bound on # tasks in Mesos

2015-05-20 Thread Nicholas Chammas
To put this on the devs' radar, I suggest creating a JIRA for it (and checking first if one already exists). issues.apache.org/jira/ Nick On Tue, May 19, 2015 at 1:34 PM Matei Zaharia matei.zaha...@gmail.com wrote: Yeah, this definitely seems useful there. There might also be some ways to

Re: Virtualenv pyspark

2015-05-08 Thread Nicholas Chammas
This is an interesting question. I don't have a solution for you, but you may be interested in taking a look at Anaconda Cluster http://continuum.io/anaconda-cluster. It's made by the same people behind Conda (an alternative to pip focused on data science pacakges) and may offer a better way of

Re: How to deploy self-build spark source code on EC2

2015-04-28 Thread Nicholas Chammas
[-dev] [+user] This is a question for the user list, not the dev list. Use the --spark-version and --spark-git-repo options to specify your own repo and hash to deploy. Source code link. https://github.com/apache/spark/blob/268c419f1586110b90e68f98cd000a782d18828c/ec2/spark_ec2.py#L189-L195

Re: Querying Cluster State

2015-04-26 Thread Nicholas Chammas
The Spark web UI offers a JSON interface with some of this information. http://stackoverflow.com/a/29659630/877069 It's not an official API, so be warned that it may change unexpectedly between versions, but you might find it helpful. Nick On Sun, Apr 26, 2015 at 9:46 AM

Re: Apache Spark User List: people's responses not showing in the browser view

2015-03-19 Thread Nicholas Chammas
Nabble is a third-party site that tries its best to archive mail sent out over the list. Nothing guarantees it will be in sync with the real mailing list. To get the truth on what was sent over this, Apache-managed list, you unfortunately need to go the Apache archives:

Re: Apache Spark User List: people's responses not showing in the browser view

2015-03-19 Thread Nicholas Chammas
-hadoop.com which provides better search capability. Cheers On Thu, Mar 19, 2015 at 6:48 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Nabble is a third-party site that tries its best to archive mail sent out over the list. Nothing guarantees it will be in sync with the real mailing list

Re: Apache Spark User List: people's responses not showing in the browser view

2015-03-19 Thread Nicholas Chammas
to find stuff in. Is there a search engine on top of them? so as to find e.g. your own posts easily? On Thu, Mar 19, 2015 at 10:34 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Sure, you can use Nabble or search-hadoop or whatever you prefer. My point is just that the source of truth

Re: Processing of text file in large gzip archive

2015-03-16 Thread Nicholas Chammas
You probably want to update this line as follows: lines = sc.textFile('file.gz').repartition(sc.defaultParallelism * 3) For more details on why, see this answer http://stackoverflow.com/a/27631722/877069. Nick ​ On Mon, Mar 16, 2015 at 6:50 AM Marius Soutier mps@gmail.com wrote: 1. I

Re: Posting to the list

2015-02-23 Thread Nicholas Chammas
Nabble is a third-party site. If you send stuff through Nabble, Nabble has to forward it along to the Apache mailing list. If something goes wrong with that, you will have a message show up on Nabble that no-one saw. The reverse can also happen, where something actually goes out on the list and

Re: Launching Spark cluster on EC2 with Ubuntu AMI

2015-02-23 Thread Nicholas Chammas
I know that Spark EC2 scripts are not guaranteed to work with custom AMIs but still, it should work… Nope, it shouldn’t, unfortunately. The Spark base AMIs are custom-built for spark-ec2. No other AMI will work unless it was built with that goal in mind. Using a random AMI from the Amazon

Re: SQLContext.applySchema strictness

2015-02-14 Thread Nicholas Chammas
Would it make sense to add an optional validate parameter to applySchema() which defaults to False, both to give users the option to check the schema immediately and to make the default behavior clearer? ​ On Sat Feb 14 2015 at 9:18:59 AM Michael Armbrust mich...@databricks.com wrote: Doing

Re: How to create spark AMI in AWS

2015-02-09 Thread Nicholas Chammas
at 3:59 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Guodong, spark-ec2 does not currently support the cn-north-1 region, but you can follow [SPARK-4241](https://issues.apache.org/jira/browse/SPARK-4241) to find out when it does. The base AMI used to generate the current Spark AMIs

Re: How to create spark AMI in AWS

2015-02-09 Thread Nicholas Chammas
Guodong, spark-ec2 does not currently support the cn-north-1 region, but you can follow [SPARK-4241](https://issues.apache.org/jira/browse/SPARK-4241) to find out when it does. The base AMI used to generate the current Spark AMIs is very old. I'm not sure anyone knows what it is anymore. What I

Re: spark 1.2 ec2 launch script hang

2015-01-28 Thread Nicholas Chammas
clusters on EC2 since with no problems.) On Wed Jan 28 2015 at 12:05:43 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Ey-chih, That makes more sense. This is a known issue that will be fixed as part of SPARK-5242 https://issues.apache.org/jira/browse/SPARK-5242. Charles, Thanks

Re: spark 1.2 ec2 launch script hang

2015-01-28 Thread Nicholas Chammas
. But the second execution when things worked with an absolute path could have worked because the random hosts that came up on EC2 were never in my known_hosts. On Wed Jan 28 2015 at 3:45:36 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Hmm, I can’t see why using ~ would be problematic

Re: spark 1.2 ec2 launch script hang

2015-01-28 Thread Nicholas Chammas
If that was indeed the problem, I suggest updating your answer on SO http://stackoverflow.com/a/28005151/877069 to help others who may run into this same problem. ​ On Wed Jan 28 2015 at 9:40:39 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: Thanks for sending this over, Peter. What

Re: saving rdd to multiple files named by the key

2015-01-27 Thread Nicholas Chammas
There is also SPARK-3533 https://issues.apache.org/jira/browse/SPARK-3533, which proposes to add a convenience method for this. ​ On Mon Jan 26 2015 at 10:38:56 PM Aniket Bhatnagar aniket.bhatna...@gmail.com wrote: This might be helpful:

Re: spark 1.2 ec2 launch script hang

2015-01-27 Thread Nicholas Chammas
For those who found that absolute vs. relative path for the pem file mattered, what OS and shell are you using? What version of Spark are you using? ~/ vs. absolute path shouldn’t matter. Your shell will expand the ~/ to the absolute path before sending it to spark-ec2. (i.e. tilde expansion.)

Re: Analyzing data from non-standard data sources (e.g. AWS Redshift)

2015-01-24 Thread Nicholas Chammas
I believe databricks provides an rdd interface to redshift. Did you check spark-packages.org? On 2015년 1월 24일 (토) at 오전 6:45 Denis Mikhalkin deni...@yahoo.com.invalid wrote: Hello, we've got some analytics data in AWS Redshift. The data is being constantly updated. I'd like to be able to

Re: Discourse: A proposed alternative to the Spark User list

2015-01-23 Thread Nicholas Chammas
or communication fora - provided that they allow exporting the conversation if those sites were to change course. However, the state of the art stands as such. - Patrick On Wed, Jan 21, 2015 at 8:43 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Josh / Patrick, What do y’all think

Re: Discourse: A proposed alternative to the Spark User list

2015-01-23 Thread Nicholas Chammas
https://issues.apache.org/jira/browse/SPARK-5390 On Fri Jan 23 2015 at 12:05:00 PM Gerard Maas gerard.m...@gmail.com wrote: +1 On Fri, Jan 23, 2015 at 5:58 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: That sounds good to me. Shall I open a JIRA / PR about updating the site

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Nicholas Chammas
I agree with Sean that a Spark-specific Stack Exchange likely won't help and almost certainly won't make it out of Area 51. The idea certainly sounds nice from our perspective as Spark users, but it doesn't mesh with the structure of Stack Exchange or the criteria for creating new sites. On Thu

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Nicholas Chammas
, Nicholas Chammas wrote: I think a few things need to be laid out clearly: 1. This mailing list is the “official” user discussion platform. That is, it is sponsored and managed by the ASF. 2. Users are free to organize independent discussion platforms focusing on Spark

Re: Discourse: A proposed alternative to the Spark User list

2015-01-21 Thread Nicholas Chammas
Josh / Patrick, What do y’all think of the idea of promoting Stack Overflow as a place to ask questions over this list, as long as the questions fit SO’s guidelines ( how-to-ask http://stackoverflow.com/help/how-to-ask, dont-ask http://stackoverflow.com/help/dont-ask)? The apache-spark

Re: Discourse: A proposed alternative to the Spark User list

2015-01-21 Thread Nicholas Chammas
I think a few things need to be laid out clearly: 1. This mailing list is the “official” user discussion platform. That is, it is sponsored and managed by the ASF. 2. Users are free to organize independent discussion platforms focusing on Spark, and there is already one such platform

Re: pyspark sc.textFile uses only 4 out of 32 threads per node

2015-01-20 Thread Nicholas Chammas
anyone reproduce it with v1.1? On Wed, Dec 17, 2014 at 2:14 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: Rui is correct. Check how many partitions your RDD has after loading the gzipped files. e.g. rdd.getNumPartitions(). If that number is way less than the number of cores

Re: Cluster hangs in 'ssh-ready' state using Spark 1.2 EC2 launch script

2015-01-18 Thread Nicholas Chammas
Nathan, I posted a bunch of questions for you as a comment on your question http://stackoverflow.com/q/28002443/877069 on Stack Overflow. If you answer them (don't forget to @ping me) I may be able to help you. Nick On Sat Jan 17 2015 at 3:49:54 PM gen tang gen.tan...@gmail.com wrote: Hi,

Re: Discourse: A proposed alternative to the Spark User list

2015-01-17 Thread Nicholas Chammas
The Stack Exchange community will not support creating a whole new site just for Spark (otherwise you’d see dedicated sites for much larger topics like “Python”). Their tagging system works well enough to separate questions about different topics, and the apache-spark

Re: dockerized spark executor on mesos?

2015-01-15 Thread Nicholas Chammas
The AMPLab maintains a bunch of Docker files for Spark here: https://github.com/amplab/docker-scripts Hasn't been updated since 1.0.0, but might be a good starting point. On Wed Jan 14 2015 at 12:14:13 PM Josh J joshjd...@gmail.com wrote: We have dockerized Spark Master and worker(s)

Re: Accidental kill in UI

2015-01-09 Thread Nicholas Chammas
As Sean said, this definitely sounds like something worth a JIRA issue (and PR). On Fri Jan 09 2015 at 8:17:34 AM Sean Owen so...@cloudera.com wrote: (FWIW yes I think this should certainly be a POST. The link can become a miniature form to achieve this and then the endpoint just needs to

  1   2   3   4   >