date:20171218

Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Weichen Xu

Hi Sunitha, In the mapper function, you cannot update outer variables such as `personLst.add(person)`, this won't work so that's the reason you got an empty list. You can use `rdd.collect()` to get a local list of `Person` objects first, then you can safely iterate on the local list and do any

Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy

Hi Jorn, In my case I have to call common interface function by passing the values of each rdd. So I have tried iterating , but I was not able to trigger common function from call method as commented in the snippet code in my earlier mail. Request you please share your views. Regards Sunitha

Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Jörn Franke

This is correct behavior. If you need to call another method simply append another map, flatmap or whatever you need. Depending on your use case you may use also reduce and reduce by key. However you never (!) should use a global variable as in your snippet. This can to work because you work in

Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy

Hi Deepak, I am able to map row to person class, issue is I want to to call another method. I tried converting to list and its not working with out using collect. Regards Sunitha On Tuesday, December 19, 2017, Deepak Sharma wrote: > I am not sure about java but in scala

Re: What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

2017-12-18 Thread KhajaAsmath Mohammed

I am looking for same answer too .. will wait for response from other people Sent from my iPhone > On Dec 18, 2017, at 10:56 PM, Gaurav1809 wrote: > > Hi All, > > Will Bigdata tools & technology work with Blockchain in future? Any possible > use cases that anyone is

What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

2017-12-18 Thread Gaurav1809

Hi All, Will Bigdata tools & technology work with Blockchain in future? Any possible use cases that anyone is likely to face, please share. Thanks Gaurav -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ -

Re: Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Deepak Sharma

I am not sure about java but in scala it would be something like df.rdd.map{ x => MyClass(x.getString(0),.)} HTH --Deepak On Dec 19, 2017 09:25, "Sunitha Chennareddy" wrote: Hi All, I am new to Spark, I want to convert DataFrame to List with out using

Help Required on Spark - Convert DataFrame to List with out using collect

2017-12-18 Thread Sunitha Chennareddy

Hi All, I am new to Spark, I want to convert DataFrame to List with out using collect(). Main requirement is I need to iterate through the rows of dataframe and call another function by passing column value of each row (person.getId()) Here is the snippet I have tried, Kindly help me to resolve

Re: Please Help with DecisionTree/FeatureIndexer

2017-12-18 Thread Weichen Xu

Hi Marco, If you add assembler at the first of the pipeline, like: ``` val pipeline = new Pipeline() .setStages(Array(assembler, labelIndexer, featureIndexer, dt, labelConverter)) ``` Which error do you got ? I think it can work fine if the `assembler` added into pipeline. Thanks. On

/tmp fills up to 100GB when using a window function

2017-12-18 Thread Mihai Iacob

This code generates files under /tmp...blockmgr... which do not get cleaned up after the job finishes. Anything wrong with the code below? or are there any known issues with spark not cleaning up /tmp files? window = Window.\ partitionBy('***', 'date_str').\

Mapping words to vector sparkml CountVectorizerModel

2017-12-18 Thread Sandeep Nemuri

Hi All, I've used CountVectorizerModel in spark ml and got the td-idf of the words. Output column of a df looks like:

Re: Several Aggregations on a window function

2017-12-18 Thread Anastasios Zouzias

Hi, You can use https://twitter.github.io/algebird/ which provides an implementation of interesting Monoids and ways to combine them to tuples (or products) of Monoids. Of course, you are not bound to use the algebird library but it might be helpful to bootstrap. On Mon, Dec 18, 2017 at 7:18

Re: Several Aggregations on a window function

2017-12-18 Thread Julien CHAMP

It seems interesting, however scalding seems to require be used outside of spark ? Le lun. 18 déc. 2017 à 17:15, Anastasios Zouzias a écrit : > Hi Julien, > > I am not sure if my answer applies on the streaming part of your question. > However, in batch processing, if you

Re: kinesis throughput problems

2017-12-18 Thread Jeremy Kelley

Gourav, Yes, sorry. Apparently I failed to mention I'm having these problems with Spark consuming from a kinesis stream. Been putting in late nights to figure this out and it's affecting my brain. :^) -jeremy -- Jeremy Kelley | Technical Director, Data jkel...@carbonblack.com | Carbon

Getting multiple regression metrics at once

2017-12-18 Thread OBones

Hello, I'm working with the ML package for regression purposes and I get good results on my data. I'm now trying to get multiple metrics at once, as right now, I'm doing what is suggested by the examples here: https://spark.apache.org/docs/2.1.0/ml-classification-regression.html Basically

Re: Several Aggregations on a window function

2017-12-18 Thread Anastasios Zouzias

Hi Julien, I am not sure if my answer applies on the streaming part of your question. However, in batch processing, if you want to perform multiple aggregations over an RDD with a single pass, a common approach is to use multiple aggregators (a.k.a. tuple monoids), see below an example from

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Liana Napalkova

Thanks, Timur. The problem is that if I run `foreachPartitions`, then I cannot ` start` the streaming query. Or perhaps I miss something. From: Timur Shenkao Sent: 18 December 2017 16:11:06 To: Liana Napalkova Cc: Silvio Fiorito;

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Cody Koeninger

You can't create a network connection to kafka on the driver and then serialize it to send it the executor. That's likely why you're getting serialization errors. Kafka producers are thread safe and designed for use as a singleton. Use a lazy singleton instance of the producer on the executor,

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Liana Napalkova

If there is no other way, then I will follow this recommendation. From: Silvio Fiorito Sent: 18 December 2017 16:20:03 To: Liana Napalkova; user@spark.apache.org Subject: Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Silvio Fiorito

Couldn’t you readStream from Kafka, do your transformations, map your rows from the transformed input into what you want need to send to Kafka, then writeStream to Kafka? From: Liana Napalkova Date: Monday, December 18, 2017 at 10:07 AM To: Silvio Fiorito

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Timur Shenkao

Spark Dataset / Dataframe has foreachPartition() as well. Its implementation is much more efficient than RDD's. There is ton of code snippets, say

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Liana Napalkova

I need to firstly read from Kafka queue into a DataFrame. Then I should perform some transformations with the data. Finally, for each row in the DataFrame I should conditionally apply KafkaProducer in order to send some data to Kafka. So, I am both consuming and producing the data from/to

Re: SANSA 0.3 (Scalable Semantic Analytics Stack) Released

2017-12-18 Thread Hajira Jabeen

Hi Timur, Thanks for your interest in SANSA. The intermediate results are stored in RDDs mostly ( sometimes in Parquet files). The use of GraphFrames is in planning, as they are not officially integrated with spark yet. Please feel free to contact us in case of any questions. Regards Hajira

Re: How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Silvio Fiorito

Why don’t you just use the Kafka sink for Spark 2.2? https://spark.apache.org/docs/2.2.0/structured-streaming-kafka-integration.html#creating-a-kafka-sink-for-streaming-queries From: Liana Napalkova Date: Monday, December 18, 2017 at 9:45 AM To:

How to properly execute `foreachPartition` in Spark 2.2

2017-12-18 Thread Liana Napalkova

Hi, I wonder how to properly execute `foreachPartition` in Spark 2.2. Below I explain the problem is details. I appreciate any help. In Spark 1.6 I was doing something similar to this: DstreamFromKafka.foreachRDD(session => { session.foreachPartition { partitionOfRecords =>

Re: flatMap() returning large class

2017-12-18 Thread Richard Garris

Hi Don, It’s not so much map() vs flatMap(). You can return a collection and have Spark flatten the result. My point was more to change from Seq[BigDataStructure] to Seq[SmallDataStructure] If the use case is really storing image data - I would try to use Seq[Vector] and store the values as a

Re: Windows10 + pyspark + ipython + csv file loading with timestamps

2017-12-18 Thread Szuromi Tamás

Hi Esa, I'm using it like this: https://gist.github.com/tromika/1cda392242fdd66befe7970d80380216 cheers, 2017-12-16 11:04 GMT+01:00 Esa Heikkinen : > Hi > > Does anyone have any hints or example (code) how to get combination: > Windows10 + pyspark + ipython notebook +

Spark - Livy - Hive Table User

2017-12-18 Thread Sudha KS

The tables created are in the name livy: drwxrwxrwx+ - livy hdfs 0 2017-12-18 09:38 /apps/hive/warehouse/dev.db/tbl 1. Df.write.saveAsTable() 2. spark.sql("CREATE TABLE tbl (key INT, value STRING)") whereas, the create table from hive shell is 'hive'/proxyuser:

Re: SANSA 0.3 (Scalable Semantic Analytics Stack) Released

2017-12-18 Thread Timur Shenkao

Hello Thank you for very interesting job! The question are : 1) where do you store final results or intermediate results? Parquet, Janusgraph, Cassandra ? 2) Is there integration with Spark GraphFrames? Sincerely yours, Timur On Mon, Dec 18, 2017 at 9:21 AM, Hajira Jabeen

Re: Several Aggregations on a window function

2017-12-18 Thread Julien CHAMP

I've been looking for several solutions but I can't find something efficient to compute many window function efficiently ( optimized computation or efficient parallelism ) Am I the only one interested by this ? Regards, Julien Le ven. 15 déc. 2017 à 21:34, Julien CHAMP

SANSA 0.3 (Scalable Semantic Analytics Stack) Released

2017-12-18 Thread Hajira Jabeen

Dear all, The Smart Data Analytics group [1] is happy to announce SANSA 0.3 - the third release of the Scalable Semantic Analytics Stack. SANSA employs distributed computing via Apache Spark and Flink in order to allow scalable machine learning, inference and querying capabilities for large

Re: Help Required on Spark - Convert DataFrame to List with out using collect

Re: Help Required on Spark - Convert DataFrame to List with out using collect

Re: Help Required on Spark - Convert DataFrame to List with out using collect

Re: Help Required on Spark - Convert DataFrame to List with out using collect

Re: What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

What does Blockchain technology mean for Big Data? And how Hadoop/Spark will play role with it?

Re: Help Required on Spark - Convert DataFrame to List with out using collect

Help Required on Spark - Convert DataFrame to List with out using collect

Re: Please Help with DecisionTree/FeatureIndexer

/tmp fills up to 100GB when using a window function

Mapping words to vector sparkml CountVectorizerModel

Re: Several Aggregations on a window function

Re: Several Aggregations on a window function

Re: kinesis throughput problems

Getting multiple regression metrics at once

Re: Several Aggregations on a window function

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: How to properly execute `foreachPartition` in Spark 2.2

Re: SANSA 0.3 (Scalable Semantic Analytics Stack) Released

Re: How to properly execute `foreachPartition` in Spark 2.2

How to properly execute `foreachPartition` in Spark 2.2

Re: flatMap() returning large class

Re: Windows10 + pyspark + ipython + csv file loading with timestamps

Spark - Livy - Hive Table User

Re: SANSA 0.3 (Scalable Semantic Analytics Stack) Released

Re: Several Aggregations on a window function

SANSA 0.3 (Scalable Semantic Analytics Stack) Released

31 matches

Site Navigation

Mail list logo

Footer information