date:20140826

Re: Handling stale PRs

2014-08-26 Thread Patrick Wendell

Hey Nicholas, Thanks for bringing this up. There are a few dimensions to this... one is that it's actually precedurally difficult for us to close pull requests. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my

Re: [Spark SQL] off-heap columnar store

2014-08-26 Thread Evan Chan

What would be the timeline for the parquet caching work? The reason I'm asking about the columnar compressed format is that there are some problems for which Parquet is not practical. On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust mich...@databricks.com wrote: What is the plan for getting

Re: [Spark SQL] off-heap columnar store

2014-08-26 Thread Michael Armbrust

Any initial proposal or design about the caching to Tachyon that you can share so far? Caching parquet files in tachyon with saveAsParquetFile and then reading them with parquetFile should already work. You can use SQL on these tables by using registerTempTable. Some of the general parquet

CoHadoop Papers

2014-08-26 Thread Gary Malouf

One of my colleagues has been questioning me as to why Spark/HDFS makes no attempts to try to co-locate related data blocks. He pointed to this paper: http://www.vldb.org/pvldb/vol4/p575-eltabakh.pdf from 2011 on the CoHadoop research and the performance improvements it yielded for Map/Reduce

Re: Handling stale PRs

2014-08-26 Thread Matthew Farrellee

On 08/26/2014 04:57 AM, Sean Owen wrote: On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell pwend...@gmail.com wrote: Most other ASF projects I know just ignore these patches. I'd prefer if we Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark is blessed/cursed with

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf

It appears support for this type of control over block placement is going out in the next version of HDFS: https://issues.apache.org/jira/browse/HDFS-2576 On Tue, Aug 26, 2014 at 7:43 AM, Gary Malouf malouf.g...@gmail.com wrote: One of my colleagues has been questioning me as to why Spark/HDFS

Re: too many CancelledKeyException throwed from ConnectionManager

2014-08-26 Thread Kousuke Saruta

Hi Shengzhe I faced to same situation. I think, Connection and ConnectionManager have some race condition issues and the error you mentioned may be caused by the issues. Now I'm trying to resolve the issue in https://github.com/apache/spark/pull/2019. Please check it out. - Kousuke

Re: Handling stale PRs

2014-08-26 Thread Madhu

Sean Owen wrote Stale JIRAs are a symptom, not a problem per se. I also want to see the backlog cleared, but automatically closing doesn't help, if the problem is too many JIRAs and not enough committer-hours to look at them. Some noise gets closed, but some easy or important fixes may

Re: Handling stale PRs

2014-08-26 Thread Erik Erlandson

- Original Message - Another thing is that we should, IMO, err on the side of explicitly saying no or not yet to patches, rather than letting them linger without attention. We do get patches where the user is well intentioned, but it is Completely agree. The solution is partly

Re: CoHadoop Papers

2014-08-26 Thread Christopher Nguyen

Gary, do you mean Spark and HDFS separately, or Spark's use of HDFS? If the former, Spark does support copartitioning. If the latter, it's an HDFS scope that's outside of Spark. On that note, Hadoop does also make attempts to collocate data, e.g., rack awareness. I'm sure the paper makes useful

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium

is there any dataType auto convert or detect or something in HiveContext ?all columns of a table is defined as string in hive metastoreone column is total_price with values like 123.45, then this column will be recognized as dataType Float in HiveContext...this is a feature or a bug? it really

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium

is there any dataType auto convert or detect or something in HiveContext ? all columns of a table is defined as string in hive metastore one column is total_price with values like 123.45, then this column will be recognized as dataType Float in HiveContext... this is a feature or a bug? it

Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling

Hi Alexander, Can you post a link to the code? RJ On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander alexander.ula...@hp.com wrote: Hi, I've implemented back propagation algorithm using Gradient class and a simple update using Updater class. Then I run the algorithm with mllib's

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas

On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

2014-08-26 Thread chutium

oops, i tried on a managed table, column types will not be changed so it is mostly due to the serde lib CSVSerDe (https://github.com/ogrodnek/csv-serde/blob/master/src/main/java/com/bizo/hive/serde/csv/CSVSerde.java#L123) or maybe CSVReader from opencsv?... but if the columns are defined as

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf

Christopher, can you expand on the co-partitioning support? We have a number of spark SQL tables (saved in parquet format) that all could be considered to have a common hash key. Our analytics team wants to do frequent joins across these different data-sets based on this key. It makes sense

Re: Handling stale PRs

2014-08-26 Thread Josh Rosen

Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). Some of my basic goals (not all implemented yet): - Users sign in using GitHub and can browse a list of pull requests, including links to associated

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas

OK, that sounds pretty cool. Josh, Do you see this app as encompassing or supplanting the functionality I described as well? Nick On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request

Re: Handling stale PRs

2014-08-26 Thread Josh Rosen

Sure; App Engine supports cron and sending emails. We can configure the app with Spark QA’s credentials in order to allow it to post comments on issues, etc. - Josh On August 26, 2014 at 11:38:08 AM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: OK, that sounds pretty cool. Josh,

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

2014-08-26 Thread npanj

I have both SPARK-2878 and SPARK-2893. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-2878-Kryo-serialisation-with-custom-Kryo-registrator-failing-tp7719p8046.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Nicholas Chammas

I downloaded the source code release for 1.0.2 from here http://spark.apache.org/downloads.html and launched an EC2 cluster using spark-ec2. After the cluster finishes launching, I fire up the shell and check the version: scala sc.version res1: String = 1.0.1 The startup banner also shows the

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Shivaram Venkataraman

This is a chicken and egg problem in some sense. We can't change the ec2 script till we have made the release and uploaded the binaries -- But once that is done, we can't update the script. I think the model we support so far is that you can launch the latest spark version from the master branch

Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling

Xiangrui, I posted a note on my JIRA for MiniBatch KMeans about the same problem -- sampling running in O(n). Can you elaborate on ways to get more efficient sampling? I think this will be important for a variety of stochastic algorithms. RJ On Tue, Aug 26, 2014 at 12:54 PM, Xiangrui Meng

Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread RJ Nowling

Also, another idea: may algorithms that use sampling tend to do so multiple times. It may be beneficial to allow a transformation to a representation that is more efficient for multiple rounds of sampling. On Tue, Aug 26, 2014 at 4:36 PM, RJ Nowling rnowl...@gmail.com wrote: Xiangrui, I

Re: Gradient descent and runMiniBatchSGD

2014-08-26 Thread Ulanov, Alexander

Hi Xiangrui, Thanks for explanation, but I'm still missing something. In my experiments, if miniBatchFraction == 1.0, no matter how the data is partitioned (2, 4, 8, 16 partitions), the algorithm executes more or less in the same time. (I have 16 Workers). Reduce from runMiniBatchSGD takes

Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas

By the way, as a reference point, I just stumbled across the Discourse GitHub project and their list of pull requests https://github.com/discourse/discourse/pulls looks pretty neat. ~2,200 closed PRs, 6 open. Least recently updated PR dates to 8 days ago. Project started ~1.5 years ago. Dunno

Re: CoHadoop Papers

2014-08-26 Thread Michael Armbrust

It seems like there are two things here: - Co-locating blocks with the same keys to avoid network transfer. - Leveraging partitioning information to avoid a shuffle when data is already partitioned correctly (even if those partitions aren't yet on the same machine). The former seems more

Re: CoHadoop Papers

2014-08-26 Thread Gary Malouf

Hi Michael, I think once that work is into HDFS, it will be great to expose this functionality via Spark. This is something worth pursuing because it could grant orders of magnitude perf improvements in cases when people need to join data. The second item would be very interesting, could yield

OutOfMemoryError when running sbt/sbt test

2014-08-26 Thread jay vyas

Hi spark. I've been trying to build spark, but I've been getting lots of oome exceptions. https://gist.github.com/jayunit100/d424b6b825ce8517d68c For the most part, they are of the form: java.lang.OutOfMemoryError: unable to create new native thread I've attempted to hard code the

Re: OutOfMemoryError when running sbt/sbt test

2014-08-26 Thread Mubarak Seyed

What is your ulimit value? On Tue, Aug 26, 2014 at 5:49 PM, jay vyas jayunit100.apa...@gmail.com wrote: Hi spark. I've been trying to build spark, but I've been getting lots of oome exceptions. https://gist.github.com/jayunit100/d424b6b825ce8517d68c For the most part, they are of the

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Matei Zaharia

This shouldn't be a chicken-and-egg problem, since the script fetches the AMI from a known URL. Seems like an issue in publishing this release. On August 26, 2014 at 1:24:45 PM, Shivaram Venkataraman (shiva...@eecs.berkeley.edu) wrote: This is a chicken and egg problem in some sense. We can't

Re: OutOfMemoryError when running sbt/sbt test

2014-08-26 Thread Jay Vyas

Thanks...! Some questions below. 1) you are suggesting that maybe this OOME is a symptom/red herring , and the true cause of it is that a thread can't span because of ulimit... If so possibly this could be flagged early on in the build. And -- where are so many threads coming from that I need

Re: OutOfMemoryError when running sbt/sbt test

2014-08-26 Thread Anand Avati

Hi Jay, The recommended way to build spark from source is through the maven system. You would want to follow the steps in https://spark.apache.org/docs/latest/building-with-maven.html to set the MAVEN_OPTS to prevent OOM build errors. Thanks On Tue, Aug 26, 2014 at 5:49 PM, jay vyas

Re: Handling stale PRs

2014-08-26 Thread Madhu

Nicholas Chammas wrote Dunno how many committers Discourse has, but it looks like they've managed their PRs well. I hope we can do as well in this regard as they have. Discourse developers appear to eat their own dog food https://meta.discourse.org . Improved collaboration and a shared vision

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

2014-08-26 Thread Tathagata Das

Yes, this was an oversight on my part. I have opened a JIRA for this. https://issues.apache.org/jira/browse/SPARK-3242 For the time being the workaround should be providing the version 1.0.2 explicitly as part of the script. TD On Tue, Aug 26, 2014 at 6:39 PM, Matei Zaharia

Re: Handling stale PRs

Re: [Spark SQL] off-heap columnar store

Re: [Spark SQL] off-heap columnar store

CoHadoop Papers

Re: Handling stale PRs

Re: CoHadoop Papers

Re: too many CancelledKeyException throwed from ConnectionManager

Re: Handling stale PRs

Re: Handling stale PRs

Re: CoHadoop Papers

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

Re: Gradient descent and runMiniBatchSGD

Re: Handling stale PRs

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

Re: CoHadoop Papers

Re: Handling stale PRs

Re: Handling stale PRs

Re: Handling stale PRs

Re: [SPARK-2878] Kryo serialisation with custom Kryo registrator failing

spark-ec2 1.0.2 creates EC2 cluster at wrong version

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

Re: Gradient descent and runMiniBatchSGD

Re: Gradient descent and runMiniBatchSGD

Re: Gradient descent and runMiniBatchSGD

Re: Handling stale PRs

Re: CoHadoop Papers

Re: CoHadoop Papers

OutOfMemoryError when running sbt/sbt test

Re: OutOfMemoryError when running sbt/sbt test

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

Re: OutOfMemoryError when running sbt/sbt test

Re: OutOfMemoryError when running sbt/sbt test

Re: Handling stale PRs

Re: spark-ec2 1.0.2 creates EC2 cluster at wrong version

35 matches

Site Navigation

Mail list logo

Footer information