Re: Using Zeppelin with Spark FP

2016-09-11 Thread andy petrella
Heya, probably worth giving the Spark Notebook a go then. It can plot any scala data (collection, rdd, df, ds, custom, ...), all are reactive so they can deal with any sort of incoming data. You can ask on the gitter

Access HDFS within Spark Map Operation

2016-09-11 Thread Saliya Ekanayake
Hi, I've got a text file where each line is a record. For each record, I need to process a file in HDFS. So if I represent these records as an RDD and invoke a map() operation on them how can I access the HDFS within that map()? Do I have to create a Spark context within map() or is there a

Spark Save mode "Overwrite" -Lock wait timeout exceeded; try restarting transaction Error

2016-09-11 Thread Subhajit Purkayastha
I am using spark 1.5.2 with Memsql Database as a persistent repository I am trying to update rows (based on the primary key), if it is appears more than 1 time (basically run the save load as a Upsert operation) val UpSertConf = SaveToMemSQLConf(msc.memSQLConf,

The 8th and the Largest Spark Summit is less than 8 weeks away!

2016-09-11 Thread Jules Damji
Fellow Sparkers!With every Spark Summit, an Apache Spark Community event, increasing numbers of users and developers attend. This is the eighth Summit in one of my best cosmopolitan cities in the European Union, Brussels.We are offering a special promo code* for all Apache Spark users and

Re: GraphX drawing algorithm

2016-09-11 Thread Michael Malak
In chapter 10 of Spark GraphX In Action, we describe how to use Zeppelin with d3.js to render graphs using d3's force-directed rendering algorithm. The source code can be downloaded for free from  https://www.manning.com/books/spark-graphx-in-action From: agc studio

Re: Using Zeppelin with Spark FP

2016-09-11 Thread Jeff Zhang
You can plot data frame. But it is not supported for RDD AFAIK. On Mon, Sep 12, 2016 at 5:12 AM, Mich Talebzadeh wrote: > Hi, > > Zeppelin is getting better. > > In its description it says: > > [image: Inline images 1] > > So far so good. One feature that I have not

GraphX drawing algorithm

2016-09-11 Thread agc studio
Hi all, I was wondering if a force-directed graph drawing algorithm has been implemented for graphX? Thanks

Using Zeppelin with Spark FP

2016-09-11 Thread Mich Talebzadeh
Hi, Zeppelin is getting better. In its description it says: [image: Inline images 1] So far so good. One feature that I have not managed to work out is creating plots with Spark functional programming. I can get SQL going by connecting to Spark thrift server and you can plot the results

Re: Spark with S3 DirectOutputCommitter

2016-09-11 Thread Steve Loughran
> On 9 Sep 2016, at 21:54, Srikanth wrote: > > Hello, > > I'm trying to use DirectOutputCommitter for s3a in Spark 2.0. I've tried a > few configs and none of them seem to work. > Output always creates _temporary directory. Rename is killing performance. > I read some

Re: Spark + Parquet + IBM Block Storage at Bluemix

2016-09-11 Thread Steve Loughran
On 9 Sep 2016, at 17:56, Daniel Lopes > wrote: Hi, someone can help I'm trying to use parquet in IBM Block Storage at Spark but when I try to load get this error: using this config credentials = { "name": "keystone", "auth_url":

Re: Get spark metrics in code

2016-09-11 Thread Steve Loughran
> On 9 Sep 2016, at 13:20, Han JU wrote: > > Hi, > > I'd like to know if there's a possibility to get spark's metrics from code. > For example > > val sc = new SparkContext(conf) > val result = myJob(sc, ...) > result.save(...) > > val gauge =

"Too many elements to create a power set" on Elasticsearch

2016-09-11 Thread Kevin Burton
1.6.1 and 1.6.2 don't work on our Elasticsearch setup because we use daily indexes. We get the error: "Too many elements to create a power set" It works on SINGLE indexes.. but if I specify content_* then I get this error. I don't see this documented anywhere. Is this a known issue? Is there

Re: Spark metrics when running with YARN?

2016-09-11 Thread Jacek Laskowski
Hi Vladimir, You'd have to talk to your cluster manager to query for all the running Spark applications. I'm pretty sure YARN and Mesos can do that but unsure about Spark Standalone. This is certainly not something a Spark application's web UI could do for you since it is designed to handle the

Re: Spark metrics when running with YARN?

2016-09-11 Thread Vladimir Tretyakov
Hello Jacek, thx a lot, it works. Is there a way how to get list of running applications from REST API? Or I have to try connect 4040 4041... 40xx ports and check if ports answer something? Best regards, Vladimir. On Sat, Sep 10, 2016 at 6:00 AM, Jacek Laskowski wrote: > Hi,

Re: Selecting the top 100 records per group by?

2016-09-11 Thread Mich Talebzadeh
You can of course do this using FP. val wSpec = Window.partitionBy('price).orderBy(desc("price")) df2.filter('security > " ").select(dense_rank().over(wSpec).as("rank"),'TIMECREATED, 'SECURITY, substring('PRICE,1,7)).filter('rank<=10).show HTH Dr Mich Talebzadeh LinkedIn *

Re: SparkR API problem with subsetting distributed data frame

2016-09-11 Thread Bene
I am calling dirs(x, dat) with a number for x and a distributed dataframe for dat, like dirs(3, df). With your logical expression Felix I would get another data frame, right? This is not what I need, I need to extract a single value in a specific cell for my calculations. Is that somehow possible?

RE: classpath conflict with spark internal libraries and the spark shell.

2016-09-11 Thread Mendelson, Assaf
You can try shading the jar. Look at maven shade plugin From: Benyi Wang [mailto:bewang.t...@gmail.com] Sent: Saturday, September 10, 2016 1:35 AM To: Colin Kincaid Williams Cc: user@spark.apache.org Subject: Re: classpath conflict with spark internal libraries and

RE: Selecting the top 100 records per group by?

2016-09-11 Thread Mendelson, Assaf
You can also create a custom aggregation function. It might provide better performance than dense_rank. Consider the following example to collect everything as list: class CollectListFunction[T](val colType: DataType) extends UserDefinedAggregateFunction { def inputSchema: StructType =

RE: add jars like spark-csv to ipython notebook with pyspakr

2016-09-11 Thread Mendelson, Assaf
In my case I do the following: export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser" pyspark --jars myjar.jar --driver-class-path myjar.jar hope this helps… From: pseudo oduesp [mailto:pseudo20...@gmail.com] Sent: Friday, September 09, 2016 3:55 PM To:

Re: Selecting the top 100 records per group by?

2016-09-11 Thread Mich Talebzadeh
DENSE_RANK will give you ordering and sequence within a particular column. This is Hive var sqltext = """ | SELECT RANK, timecreated,security, price | FROM ( |SELECT timecreated,security, price, | DENSE_RANK() OVER (ORDER BY price DESC ) AS RANK |