Re: [Distutils] How to deprecate a python package

2016-04-06 Thread Nicholas Chammas
FYI, there is an existing issue on Warehouse's tracker for this: https://github.com/pypa/warehouse/issues/345 ___ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

[jira] [Closed] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2016-04-05 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas closed SPARK-3821. --- Resolution: Won't Fix I'm resolving this as "Won't Fix" due to lack of interest,

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
ished one. > > Unfortunately however, I don't know what tool is used to generate the > > hash and I can't reproduce the format, so I ended up manually > > comparing the hashes. > > > > On Mon, Apr 4, 2016 at 2:39 PM, Nicholas Chammas > > <nicholas.cham...@

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
the root cause is > found. > > On Thu, Mar 24, 2016 at 7:25 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >> Just checking in on this again as the builds on S3 are still broken. :/ >> >> Could it have something to do with us moving release-build

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-04-04 Thread Nicholas Chammas
This is still an issue. The Spark 1.6.1 packages on S3 are corrupt. Is anyone looking into this issue? Is there anything contributors can do to help solve this problem? Nick On Sun, Mar 27, 2016 at 8:49 PM Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > Pingity-ping-p

[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2016-03-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15214577#comment-15214577 ] Nicholas Chammas commented on SPARK-3533: - I've added 2 workaround to this issue

[jira] [Updated] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2016-03-28 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-3533: Description: Users often have a single RDD of key-value pairs that they want to save

[Distutils] Thank you for the ability to do `pip install git+https://...`

2016-03-28 Thread Nicholas Chammas
Dunno how old/new this feature is, or what people did before it existed, but I just wanted to thank the people who thought of and built the ability to do installs from git+https. It lets me offer the following to my users when they want the “bleeding edge”

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-27 Thread Nicholas Chammas
Pingity-ping-pong since this is still a problem. On Thu, Mar 24, 2016 at 4:08 PM Michael Armbrust <mich...@databricks.com> wrote: > Patrick is investigating. > > On Thu, Mar 24, 2016 at 7:25 AM, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > >&g

Re: Reading Back a Cached RDD

2016-03-24 Thread Nicholas Chammas
Isn’t persist() only for reusing an RDD within an active application? Maybe checkpoint() is what you’re looking for instead? ​ On Thu, Mar 24, 2016 at 2:02 PM Afshartous, Nick wrote: > > Hi, > > > After calling RDD.persist(), is then possible to come back later and >

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-24 Thread Nicholas Chammas
Just checking in on this again as the builds on S3 are still broken. :/ Could it have something to do with us moving release-build.sh <https://github.com/apache/spark/commits/master/dev/create-release/release-build.sh> ? ​ On Mon, Mar 21, 2016 at 1:43 PM Nicholas Chammas <nich

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-21 Thread Nicholas Chammas
t; confusion, the link I get for a direct download of Spark 1.6.1 / > Hadoop 2.6 is > http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz > > On Fri, Mar 18, 2016 at 3:20 PM, Nicholas Chammas > <nicholas.cham...@gmail.com> wrote: > > I just retried the Spark 1.6.1

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-20 Thread Nicholas Chammas
rable: exiting now > > On Thu, Mar 17, 2016 at 8:57 AM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Patrick reuploaded the artifacts, so it should be fixed now. >> On Mar 16, 2016 5:48 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com> >

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-19 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197451#comment-15197451 ] Nicholas Chammas commented on SPARK-7481: - (Sorry Steve; can't comment on your proposal since I

Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Nicholas Chammas
https://s3.amazonaws.com/spark-related-packages/spark-1.6.1-bin-hadoop2.6.tgz Does anyone else have trouble unzipping this? How did this happen? What I get is: $ gzip -t spark-1.6.1-bin-hadoop2.6.tgz gzip: spark-1.6.1-bin-hadoop2.6.tgz: unexpected end of file gzip:

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Nicholas Chammas
euploaded the artifacts, so it should be fixed now. > On Mar 16, 2016 5:48 PM, "Nicholas Chammas" <nicholas.cham...@gmail.com> > wrote: > >> Looks like the other packages may also be corrupt. I’m getting the same >> error for the Spark 1.6.1 / Hadoop 2.4 package. &

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-19 Thread Nicholas Chammas
ux, I got: > > $ tar zxf spark-1.6.1-bin-hadoop2.6.tgz > > gzip: stdin: unexpected end of file > tar: Unexpected EOF in archive > tar: Unexpected EOF in archive > tar: Error is not recoverable: exiting now > > On Wed, Mar 16, 2016 at 5:15 PM, Nicholas Chammas < > nichol

Re: Spark 1.6.1 Hadoop 2.6 package on S3 corrupt?

2016-03-18 Thread Nicholas Chammas
t; I just experienced the issue, however retrying the download a second > time worked. Could it be that there is some load balancer/cache in > front of the archive and some nodes still serve the corrupt packages? > > On Fri, Mar 18, 2016 at 8:00 AM, Nicholas Chammas > <nicholas.cham..

[jira] [Commented] (SPARK-7505) Update PySpark DataFrame docs: encourage __getitem__, mark as experimental, etc.

2016-03-05 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15181776#comment-15181776 ] Nicholas Chammas commented on SPARK-7505: - I believe items 1, 3, and 4 still apply. They're minor

[jira] [Commented] (SPARK-13596) Move misc top-level build files into appropriate subdirs

2016-03-04 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15180072#comment-15180072 ] Nicholas Chammas commented on SPARK-13596: -- Looks like {{tox.ini}} is only used by {{pep8}}, so

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
ich is the core problem anyway. > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > ---- Original message > From: Nicholas Chammas <nicholas.cham...@gmail.com> > Date: 03/02/2016 5:43 PM (GMT-05:00) > To: Darren Govoni <dar...@ont

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
aditional > RDD? > > For us almost all the processing comes before there is structure to it. > > > > > > Sent from my Verizon Wireless 4G LTE smartphone > > > ---- Original message > From: Nicholas Chammas <nicholas.cham...@gmail.com> &g

Re: Does pyspark still lag far behind the Scala API in terms of features

2016-03-02 Thread Nicholas Chammas
> However, I believe, investing (or having some members of your group) learn and invest in Scala is worthwhile for few reasons. One, you will get the performance gain, especially now with Tungsten (not sure how it relates to Python, but some other knowledgeable people on the list, please chime

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176559#comment-15176559 ] Nicholas Chammas commented on SPARK-7481: - I'm not comfortable working with Maven so I can't

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-02 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176551#comment-15176551 ] Nicholas Chammas commented on SPARK-7481: - {quote} One issue here that hadoop 2.6's hadoop-aws

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174438#comment-15174438 ] Nicholas Chammas commented on SPARK-7481: - Many people seem to be downgrading to use Spark built

[issue26463] asyncio-related (?) segmentation fault

2016-03-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: Thanks for the tip. Enabling the fault handler reveals that the crash is happening from the Cryptography library. I'll move this issue there. Thank you. -- resolution: -> not a bug status: open -> closed Added file: http://bugs.python.org/fil

[issue26463] asyncio-related (?) segmentation fault

2016-02-29 Thread Nicholas Chammas
Changes by Nicholas Chammas <nicholas.cham...@gmail.com>: Added file: http://bugs.python.org/file42052/stacktrace.txt ___ Python tracker <rep...@bugs.python.org> <http://bugs.python

[issue26463] asyncio-related (?) segmentation fault

2016-02-29 Thread Nicholas Chammas
New submission from Nicholas Chammas: Python 3.5.1, OS X 10.11.3. I have an application that uses asyncio and Cryptography (via the AsyncSSH library). Cryptography has some parts written in C, I believe. I'm testing my application by sending a keyboard interrupt while 2 tasks are working. My

Re: Is this likely to cause any problems?

2016-02-19 Thread Nicholas Chammas
The docs mention spark-ec2 because it is part of the Spark project. There are many, many alternatives to spark-ec2 out there like EMR, but it's probably not the place of the official docs to promote any one of those third-party solutions. On Fri, Feb 19, 2016 at 11:05 AM James Hammerton

Re: [core-workflow] Help needed: best way to convert hg repos to git?

2016-02-15 Thread Nicholas Chammas
Response from GitHub staff regarding using their Importer to import CPython: Unfortunately, the repository is too large to migrate using the importer. I’d recommend converting it to git locally using something like hg-fast-export. Due to its size, you’ll need to push

[issue8706] accept keyword arguments on most base type methods and builtins

2016-02-14 Thread Nicholas Chammas
Changes by Nicholas Chammas <nicholas.cham...@gmail.com>: -- nosy: +Nicholas Chammas ___ Python tracker <rep...@bugs.python.org> <http://bugs.pytho

[issue26334] bytes.translate() doesn't take keyword arguments; docs suggests it does

2016-02-12 Thread Nicholas Chammas
Nicholas Chammas added the comment: Yep, you're right. I'm just understanding now that we have lots of methods defined in C which have signatures like this. Is there an umbrella issue, perhaps, that covers adding support for keyword-based arguments to functions defined in C, like `translate

Re: [core-workflow] Help needed: best way to convert hg repos to git?

2016-02-11 Thread Nicholas Chammas
> I'm currently trying to import to see how it looks, have been stuck at 0% for a few minutes now. Doing the same myself. Got to 73% and it restarted. Am back at 73% now. Already reached out to GitHub to make them aware of the issue. Will report here when/if I have results. Nick

[issue26334] bytes.translate() doesn't take keyword arguments; docs suggests it does

2016-02-10 Thread Nicholas Chammas
Nicholas Chammas added the comment: So you're saying if `bytes.translate()` accepted keyword arguments, its signature would look something like this? ``` bytes.translate(table, delete=None) ``` I guess I was under the mistaken assumption that argument names in the docs always matched keyword

[issue26334] bytes.translate() doesn't take keyword arguments; docs suggests it does

2016-02-10 Thread Nicholas Chammas
New submission from Nicholas Chammas: The docs for `bytes.translate()` [0] show the following signature: ``` bytes.translate(table[, delete]) ``` However, calling this method with keyword arguments yields: ``` >>> b''.translate(table='la table', delete=b'delete') Traceback (most re

[issue26188] Provide more helpful error message when `await` is called inside non-`async` method

2016-02-02 Thread Nicholas Chammas
Nicholas Chammas added the comment: Related discussions about providing more helpful syntax error messages: * http://bugs.python.org/issue1634034 * http://bugs.python.org/issue400734 * http://bugs.python.org/issue20608 >From the discussion on issue1634034, it looks like providing bet

[issue7850] platform.system() should be "macosx" instead of "Darwin" on OSX

2016-01-30 Thread Nicholas Chammas
Nicholas Chammas added the comment: As of Python 3.5.1 [0], it looks like 1) the `aliased` and `terse` parameters of `platform.platform()` are documented to take integers instead of booleans (contrary to what Marc-Andre requested), and 2) calling `platform.platform()` with `aliased` set

Re: [PyInstaller] Examples of projects using PyInstaller

2016-01-28 Thread Nicholas Chammas
You may have a look at brog -backup: https://github.com/borgbackup/borg > Thanks for the reference! Looks like this is the money shot . As long as you can pass all required parameters on

Re: Is spark-ec2 going away?

2016-01-27 Thread Nicholas Chammas
I noticed that in the main branch, the ec2 directory along with the spark-ec2 script is no longer present. It’s been moved out of the main repo to its own location: https://github.com/amplab/spark-ec2/pull/21 Is spark-ec2 going away in the next release? If so, what would be the best alternative

Re: Mutiple spark contexts

2016-01-27 Thread Nicholas Chammas
There is a lengthy discussion about this on the JIRA: https://issues.apache.org/jira/browse/SPARK-2243 On Wed, Jan 27, 2016 at 1:43 PM Herman van Hövell tot Westerflier < hvanhov...@questtec.nl> wrote: > Just out of curiousity. What is the use case for having multiple active > contexts in a

[jira] [Commented] (SPARK-5189) Reorganize EC2 scripts so that nodes can be provisioned independent of Spark master

2016-01-27 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15119220#comment-15119220 ] Nicholas Chammas commented on SPARK-5189: - FWIW, I found this issue to be practically unsolvable

[PyInstaller] Examples of projects using PyInstaller

2016-01-26 Thread Nicholas Chammas
Howdy, I’m looking for examples of projects using PyInstaller on a regular basis to package and release their work. I’m getting ready to make my first release using PyInstaller , and as a Python newbie I think it would be instructive for me to

[issue26188] Provide more helpful error message when `await` is called inside non-`async` method

2016-01-23 Thread Nicholas Chammas
New submission from Nicholas Chammas: Here is the user interaction: ```python $ python3 Python 3.5.1 (default, Dec 7 2015, 21:59:10) [GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin Type "help", "copyright", "credits" or "license&qu

[jira] [Commented] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098887#comment-15098887 ] Nicholas Chammas commented on SPARK-12824: -- Ah, good catch. This appears to be a known behavior

[jira] [Commented] (SPARK-12824) Failure to maintain consistent RDD references in pyspark

2016-01-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098336#comment-15098336 ] Nicholas Chammas commented on SPARK-12824: -- I can reproduce this issue. Here's a more concise

[issue26035] traceback.print_tb() takes `tb`, not `traceback` as a keyword argument

2016-01-06 Thread Nicholas Chammas
New submission from Nicholas Chammas: Here is traceback.print_tb()'s signature [0]: ``` def print_tb(tb, limit=None, file=None): ``` However, its documentation reads [1]: ``` .. function:: print_tb(traceback, limit=None, file=None) ``` Did the keyword argument change recently

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
+1 Red Hat supports Python 2.6 on REHL 5 until 2020 , but otherwise yes, Python 2.6 is ancient history and the core Python developers stopped supporting it in 2013. REHL 5 is not a good enough reason to continue support for Python

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
+1 Red Hat supports Python 2.6 on REHL 5 until 2020 , but otherwise yes, Python 2.6 is ancient history and the core Python developers stopped supporting it in 2013. REHL 5 is not a good enough reason to continue support for Python

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
om > > wrote: > >> I don't see a reason Spark 2.0 would need to support Python 2.6. At this >> point, Python 3 should be the default that is encouraged. >> Most organizations acknowledge the 2.7 is common, but lagging behind the >> version they should theoretically use.

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
om > > wrote: > >> I don't see a reason Spark 2.0 would need to support Python 2.6. At this >> point, Python 3 should be the default that is encouraged. >> Most organizations acknowledge the 2.7 is common, but lagging behind the >> version they should theoretically use.

Re: [core-workflow] My initial thoughts on the steps/blockers of the transition

2016-01-05 Thread Nicholas Chammas
We can set a commit status that will show red if the user hasn’t signed the CLA (just like if Travis tests failed or so). No need to use a banner or anything. This is a great idea. Almost any automated check we want to run against PRs can be captured as a Travis/CI test that shows up on the PR

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
I think all the slaves need the same (or a compatible) version of Python installed since they run Python code in PySpark jobs natively. On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote: > interesting i didnt know that! > > On Tue, Jan 5, 2016 at 5:57 PM, N

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
I think all the slaves need the same (or a compatible) version of Python installed since they run Python code in PySpark jobs natively. On Tue, Jan 5, 2016 at 6:02 PM Koert Kuipers <ko...@tresata.com> wrote: > interesting i didnt know that! > > On Tue, Jan 5, 2016 at 5:57 PM, N

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
va 7 and python 2.6, no matter how outdated that is. >>> >>> i dont like it either, but i cannot change it. >>> >>> we currently don't use pyspark so i have no stake in this, but if we did >>> i can assure you we would not upgrade to spark 2.x if python 2.6 was >&g

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Nicholas Chammas
va 7 and python 2.6, no matter how outdated that is. >>> >>> i dont like it either, but i cannot change it. >>> >>> we currently don't use pyspark so i have no stake in this, but if we did >>> i can assure you we would not upgrade to spark 2.x if python 2.6 was >&g

Re: [core-workflow] Standard library separation from core (was Re: My initial thoughts on the steps/blockers of the transition)

2016-01-04 Thread Nicholas Chammas
uary 2016 at 12:50, Nicholas Chammas <nicholas.cham...@gmail.com> > wrote: > > Something else to consider. We’ve long talked about splitting out the > stdlib > > to make it easier for the alternative implementations to import. If some > or > > all of them also swi

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-24 Thread Nicholas Chammas
for automated provisioning/deployments.” That would suffice. But as things stand now, I have to guess and wonder at this stuff. Nick ​ On Thu, Dec 24, 2015 at 5:43 AM Steve Loughran <ste...@hortonworks.com> wrote: > > On 24 Dec 2015, at 05:59, Nicholas Chammas <nicholas.cham...@gma

Re: A proposal for Spark 2.0

2015-12-23 Thread Nicholas Chammas
Yeah, I'd also favor maintaining docs with strictly temporary relevance on JIRA when possible. The wiki is like this weird backwater I only rarely visit. Don't we typically do this kind of stuff with an umbrella issue on JIRA? Tom, wouldn't that work well for you? Nick On Wed, Dec 23, 2015 at

Re: Downloading Hadoop from s3://spark-related-packages/

2015-12-23 Thread Nicholas Chammas
replaced the cgi one from before. Also it looks like the lua one >> also supports `action=download` with a filename argument. So you could >> just do something like >> >> wget >> http://www.apache.org/dyn/closer.lua?filename=hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

[issue25768] compileall functions do not document or test return values

2015-12-20 Thread Nicholas Chammas
Nicholas Chammas added the comment: Alright, sounds good to me. Thank you for guiding me through the process! -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue25768] compileall functions do not document or test return values

2015-12-19 Thread Nicholas Chammas
Nicholas Chammas added the comment: Ah, I see. The setup/teardown stuff runs for each test. So this is what I did: * Added a method to add a "bad" source file to the source directory. It gets cleaned up with the existing teardown method. * Used test_importlib to temporarily mutat

[jira] [Comment Edited] (SPARK-3821) Develop an automated way of creating Spark images (AMI, Docker, and others)

2015-12-18 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14203280#comment-14203280 ] Nicholas Chammas edited comment on SPARK-3821 at 12/18/15 9:08 PM

[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2015-12-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053977#comment-15053977 ] Nicholas Chammas commented on SPARK-2870: - > Do you think its OK to close this issue? I have

[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2015-12-11 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053131#comment-15053131 ] Nicholas Chammas commented on SPARK-2870: - Go for it. I don't think anyone else is. > Thoro

[issue25768] compileall functions do not document or test return values

2015-12-09 Thread Nicholas Chammas
Nicholas Chammas added the comment: I've added the tests as we discussed. A couple of comments: * I found it difficult to reuse the existing setUp() code so had to essentially repeat a bunch of very similar code to create "bad" files. Let me know if you think there is a better

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Nicholas Chammas
fresh EC2 instance a significant chunk of the initial build > time might be due to artifact resolution + downloading. Putting > pre-populated Ivy and Maven caches onto your EC2 machine could shave a > decent chunk of time off that first build. > > On Tue, Dec 8, 2015 at 9:16 AM, Nicholas Cham

Re: Fastest way to build Spark from scratch

2015-12-08 Thread Nicholas Chammas
ou know > when some work in a terminal is ready, so you can do the first-thing-in-the > morning build-of-the-SNAPSHOTS > > mvn install -DskipTests -Pyarn,hadoop-2.6 -Dhadoop.version=2.7.1; say moo > > After that you can work on the modules you care about (via the -pl) > option

[issue24931] _asdict breaks when inheriting from a namedtuple

2015-12-08 Thread Nicholas Chammas
Nicholas Chammas added the comment: I know. I came across this issue after upgrading to the 3.5.1 release and seeing that vars(namedtuple) didn't work anymore. I looked through the changelog [0] for an explanation of why that might be and couldn't find one, so I posted that question on Stack

[issue24931] _asdict breaks when inheriting from a namedtuple

2015-12-08 Thread Nicholas Chammas
Nicholas Chammas added the comment: Should this change be called out in the 3.5.1 release docs? It makes some code that works on 3.5.0 break in 3.5.1. See: http://stackoverflow.com/q/34166469/877069 -- nosy: +Nicholas Chammas ___ Python tracker

[issue25768] compileall functions do not document or test return values

2015-12-05 Thread Nicholas Chammas
Nicholas Chammas added the comment: Absolutely. I'll add a "bad source file" to `setUp()` [0] and check return values as part of the existing checks in `test_compile_files()` [1]. Does that sound like a good plan to you? Also, I noticed that `compile_path()` has no tests. Sho

Re: Not all workers seem to run in a standalone cluster setup by spark-ec2 script

2015-12-04 Thread Nicholas Chammas
Quick question: Are you processing gzipped files by any chance? It's a common stumbling block people hit. See: http://stackoverflow.com/q/27531816/877069 Nick On Fri, Dec 4, 2015 at 2:28 PM Kyohey Hamaguchi wrote: > Hi, > > I have setup a Spark standalone-cluster, which

Re: [VOTE] Release Apache Spark 1.6.0 (RC1)

2015-12-02 Thread Nicholas Chammas
-0 If spark-ec2 is still a supported part of the project, then we should update its version lists as new releases are made. 1.5.2 had the same issue. https://github.com/apache/spark/blob/v1.6.0-rc1/ec2/spark_ec2.py#L54-L91 (I guess as part of the 2.0 discussions we should continue to discuss

[jira] [Created] (SPARK-12107) Update spark-ec2 versions

2015-12-02 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-12107: Summary: Update spark-ec2 versions Key: SPARK-12107 URL: https://issues.apache.org/jira/browse/SPARK-12107 Project: Spark Issue Type: Bug

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: OK, here's a patch. I reviewed the doc style guide [0] but I'm not 100% sure if I'm using the appropriate tense. There are also a couple of lines that go a bit over 80 characters, but the file already had a few of those. Am happy to make any adjustments

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: And I just signed the contributor agreement. (Some banner showed up when I attached the patch to this issue asking me to do so.) -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/i

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: :thumbsup: Take your time. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25768> ___ ___ Pyth

[issue25775] Bug tracker emails go to spam

2015-12-01 Thread Nicholas Chammas
New submission from Nicholas Chammas: Not sure where to report this. Is there a component for the bug tracker itself? Anyway, Gmail sends emails from this bug tracker to spam and flags each one with the following message: > Why is this message in Spam? It is in violation of Googl

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: Oh derp. It appears this is dup of issue24386. Apologies. -- status: open -> closed ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.or

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: Whoops, wrong issue. Reopening. -- status: closed -> open ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.or

[issue25775] Bug tracker emails go to spam

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: Oh derp. It appears this is dup of issue24386. Apologies. -- status: open -> closed ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.or

[issue25768] compileall functions do not document return values

2015-12-01 Thread Nicholas Chammas
Nicholas Chammas added the comment: Exciting! I'm on it. -- ___ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25768> ___ ___ Pyth

[issue25768] compileall functions do not document return values

2015-11-29 Thread Nicholas Chammas
New submission from Nicholas Chammas: I'm using the public functions of Python's built-in compileall module. https://docs.python.org/3/library/compileall.html#public-functions There doesn't appear to be documentation of what each of these functions returns. I figured out, for example

Re: Adding more slaves to a running cluster

2015-11-25 Thread Nicholas Chammas
spark-ec2 does not directly support adding instances to an existing cluster, apart from the special case of adding slaves to a cluster with a master but no slaves. There is an open issue to track adding this support, SPARK-2008 , but it doesn't

[jira] [Comment Edited] (SPARK-9999) Dataset API on top of Catalyst/DataFrame

2015-11-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022735#comment-15022735 ] Nicholas Chammas edited comment on SPARK- at 11/23/15 8:06 PM

Fastest way to build Spark from scratch

2015-11-23 Thread Nicholas Chammas
Say I want to build a complete Spark distribution against Hadoop 2.6+ as fast as possible from scratch. This is what I’m doing at the moment: ./make-distribution.sh -T 1C -Phadoop-2.6 -T 1C instructs Maven to spin up 1 thread per available core. This takes around 20 minutes on an m3.large

Re: spark-ec2 script to launch cluster running Spark 1.5.2 built with HIVE?

2015-11-23 Thread Nicholas Chammas
Don't the Hadoop builds include Hive already? Like spark-1.5.2-bin-hadoop2.6.tgz? On Mon, Nov 23, 2015 at 7:49 PM Jeff Schecter wrote: > Hi all, > > As far as I can tell, the bundled spark-ec2 script provides no way to > launch a cluster running Spark 1.5.2 pre-built with

[jira] [Commented] (SPARK-9999) Dataset API on top of Catalyst/DataFrame

2015-11-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022735#comment-15022735 ] Nicholas Chammas commented on SPARK-: - [~sandyr] - Hmm, so are you saying that, generally

[jira] [Commented] (SPARK-9999) Dataset API on top of Catalyst/DataFrame

2015-11-23 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15022957#comment-15022957 ] Nicholas Chammas commented on SPARK-: - If you are referring to my comment, note that I am

[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020729#comment-15020729 ] Nicholas Chammas commented on SPARK-11903: -- Also, we could just leave the option

[jira] [Created] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-21 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-11903: Summary: Deprecate make-distribution.sh --skip-java-test Key: SPARK-11903 URL: https://issues.apache.org/jira/browse/SPARK-11903 Project: Spark

[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020725#comment-15020725 ] Nicholas Chammas commented on SPARK-11903: -- cc [~pwendell] and [~srowen] - Y'all probably know

[jira] [Commented] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15020728#comment-15020728 ] Nicholas Chammas commented on SPARK-11903: -- Oh, could you elaborate a bit? From what I

[jira] [Updated] (SPARK-11903) Deprecate make-distribution.sh --skip-java-test

2015-11-21 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-11903: - Description: The {{\-\-skip-java-test}} option to {{make-distribution.sh}} [does

[jira] [Commented] (SPARK-9999) Dataset API on top of Catalyst/DataFrame

2015-11-20 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15019214#comment-15019214 ] Nicholas Chammas commented on SPARK-: - Arriving a little late to this discussion. Quick

[jira] [Commented] (SPARK-11744) bin/pyspark --version doesn't return version and exit

2015-11-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15005572#comment-15005572 ] Nicholas Chammas commented on SPARK-11744: -- Not sure who would be the best person to comment

[jira] [Updated] (SPARK-11744) bin/pyspark --version doesn't return version and exit

2015-11-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-11744: - Description: {{bin/pyspark \-\-help}} offers a {{\-\-version}} option: {code} $ ./spark

[jira] [Created] (SPARK-11744) bin/pyspark --version doesn't return version and exit

2015-11-14 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-11744: Summary: bin/pyspark --version doesn't return version and exit Key: SPARK-11744 URL: https://issues.apache.org/jira/browse/SPARK-11744 Project: Spark

[jira] [Updated] (SPARK-11744) bin/pyspark --version doesn't return version and exit

2015-11-14 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-11744: - Description: {{bin/pyspark \-\-help}} offers a {{\-\-version}} option: {code} $ ./spark

Re: Upgrading Spark in EC2 clusters

2015-11-12 Thread Nicholas Chammas
spark-ec2 does not offer a way to upgrade an existing cluster, and from what I gather, it wasn't intended to be used to manage long-lasting infrastructure. The recommended approach really is to just destroy your existing cluster and launch a new one with the desired configuration. If you want to

<    4   5   6   7   8   9   10   11   12   13   >