case?
>
> Regards,
> Gourav
>
> On Mon, May 2, 2016 at 5:59 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
>> That's my interpretation.
>>
>> On Mon, May 2, 2016 at 9:45 AM, Buntu Dev < <buntu...@gmail.com>
>> buntu...@gmail.com> wrote:
>>
>&g
not defined.
>
> On Sat, May 7, 2016 at 11:48 PM, Buntu Dev <buntu...@gmail.com> wrote:
> > I'm using pyspark dataframe api to sort by specific column and then
> saving
> > the dataframe as parquet file. But the resulting parquet file doesn't
> seem
> > to b
I'm using pyspark dataframe api to sort by specific column and then saving
the dataframe as parquet file. But the resulting parquet file doesn't seem
to be sorted.
Applying sort and doing a head() on the results shows the correct results
sorted by 'value' column in desc order, as shown below:
this?
On Mon, May 2, 2016 at 6:21 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Please consider decreasing block size.
>
> Thanks
>
> > On May 1, 2016, at 9:19 PM, Buntu Dev <buntu...@gmail.com> wrote:
> >
> > I got a 10g limitation on the executors and operating on parq
I got a 10g limitation on the executors and operating on parquet dataset
with block size 70M with 200 blocks. I keep hitting the memory limits when
doing a 'select * from t1 order by c1 limit 100' (ie, 1M). It works if
I limit to say 100k. What are the options to save a large dataset without
:01 PM, Krishna <research...@gmail.com> wrote:
> I recently encountered similar network related errors and was able to fix
> it by applying the ethtool updates decribed here [
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-5085]
>
>
> On Friday, April 29
e error
I would ultimately want to store the result set as parquet. Are there any
other options to handle this?
Thanks!
On Wed, Apr 27, 2016 at 11:10 AM, Buntu Dev <buntu...@gmail.com> wrote:
> I got 14GB of parquet data and when trying to apply order by using spark
> sql and save the f
I got 14GB of parquet data and when trying to apply order by using spark
sql and save the first 1M rows but keeps failing with "Connection reset by
peer: socket write error" on the executors.
I've allocated about 10g to both driver and the executors along with
setting the maxResultSize to 10g but
rovide a way to reproduce it (could generate fake
> dataset)?
>
> On Sat, Apr 9, 2016 at 4:33 PM, Buntu Dev <buntu...@gmail.com> wrote:
> > I've allocated about 4g for the driver. For the count stage, I notice the
> > Shuffle Write to be 13.9 GB.
> >
> > On
gt; Looks like the exception occurred on driver.
>
> Consider increasing the values for the following config:
>
> conf.set("spark.driver.memory", "10240m")
> conf.set("spark.driver.maxResultSize", "2g")
>
> Cheers
>
> On Sat, Apr 9, 20
;
> Pozdrawiam,
> Jacek Laskowski
>
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Sat, Apr 9, 2016 at 7:51 PM, Buntu Dev <buntu...@gmail.com> wrote:
> &
I'm running this motif pattern against 1.5M vertices (5.5mb) and 10M (60mb)
edges:
tgraph.find("(a)-[]->(b); (c)-[]->(b); (c)-[]->(d)")
I keep running into Java heap space errors:
~
ERROR actor.ActorSystemImpl: Uncaught fatal error from thread
I've allocated about 4g for the driver. For the count stage, I notice the
Shuffle Write to be 13.9 GB.
On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndj...@gmail.com> wrote:
> What's the size of your driver?
> On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntu...@gmail.com> wr
Actually, df.show() works displaying 20 rows but df.count() is the one
which is causing the driver to run out of memory. There are just 3 INT
columns.
Any idea what could be the reason?
On Sat, Apr 9, 2016 at 10:47 AM, wrote:
> You seem to have a lot of column :-) !
>
I tried setting both the hdfs and parquet block size but write to parquet
did not seem to have had any effect on the total number of blocks or the
average block size. Here is what I did:
sqlContext.setConf("dfs.blocksize", "134217728")
sqlContext.setConf("parquet.block.size", "134217728")
You may want to read this post regarding Spark with Drools:
http://blog.cloudera.com/blog/2015/11/how-to-build-a-complex-event-processing-app-on-apache-spark-and-drools/
On Wed, Nov 4, 2015 at 8:05 PM, Daniel Mahler wrote:
> I am not familiar with any rule engines on Spark
Thanks.. I was using Scala 2.11.1 and was able to
use algebird-core_2.10-0.1.11.jar with spark-shell.
On Thu, Oct 30, 2014 at 8:22 AM, Ian O'Connell i...@ianoconnell.com wrote:
Whats the error with the 2.10 version of algebird?
On Thu, Oct 30, 2014 at 12:49 AM, thadude ohpre...@yahoo.com
Thanks Akhil.
On Mon, Oct 20, 2014 at 1:57 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Its a known bug in JDK7 and OSX's naming convention, here's how to resolve
it:
1. Get the Snappy jar file from
http://central.maven.org/maven2/org/xerial/snappy/snappy-java/
2. Copy the appropriate
Thanks Sean, but I'm importing org.apache.spark.streaming.
StreamingContext._
Here are the spark imports:
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf
val stream
impala, then it may allow if the schema
changes are append only. Otherwise existing Parquet files have to be
migrated to new schema.
- Original Message -
From: Buntu Dev buntu...@gmail.com
To: Soumitra Kumar kumar.soumi...@gmail.com
Cc: u...@spark.incubator.apache.org
Sent: Tuesday
Thanks for the update.. I'm interested in writing the results to MySQL as
well, can you share some light or code sample on how you setup the
driver/connection pool/etc.?
On Thu, Sep 25, 2014 at 4:00 PM, maddenpj madde...@gmail.com wrote:
Update for posterity, so once again I solved the problem
the same thing in 1.0.0 using DSL only. Just curious,
why don't you use the hql() / sql() methods and pass a query string
in?
[1] https://github.com/apache/spark/pull/1211/files
On Thu, Jul 31, 2014 at 2:20 PM, Buntu Dev buntu...@gmail.com wrote:
Thanks Zongheng for the pointer
Thanks Michael for confirming!
On Thu, Jul 31, 2014 at 2:43 PM, Michael Armbrust mich...@databricks.com
wrote:
The performance should be the same using the DSL or SQL strings.
On Thu, Jul 31, 2014 at 2:36 PM, Buntu Dev buntu...@gmail.com wrote:
I was not sure if registerAsTable
23 matches
Mail list logo