date:20160208

Futures timed out after [120 seconds]

2016-02-08 Thread Andrew Milkowski

Hello, have question , we seeing below exceptions, and at the moment are enabling JVM profiler to look into gc activity on workers and if you have any other suggestions do let know please , we dont just want increase rpc timeout (from 120) to 600 sec lets say but get to reason why workers timeout

Re: [External] Re: Spark 1.6.0 HiveContext NPE

2016-02-08 Thread Shipper, Jay [USA]

I looked back into this today. I made some changes last week to the application to allow for not only compatibility with Spark 1.5.2, but also backwards compatibility with Spark 1.4.1 (the version our current deployment uses). The changes mostly involved changing dependencies from compile to

Access batch statistics in Spark Streaming

2016-02-08 Thread Chen Song

Apologize in advance if someone has already asked and addressed this question. In Spark Streaming, how can I programmatically get the batch statistics like schedule delay, total delay and processing time (They are shown in the job UI streaming tab)? I need such information to raise alerts in some

Re: Spark Streaming with Druid?

2016-02-08 Thread Umesh Kacha

Hi Hemant, thanks much can we use SnappyData on YARN. My Spark jobs run using yarn client mode. Please guide. On Mon, Feb 8, 2016 at 9:46 AM, Hemant Bhanawat wrote: > You may want to have a look at spark druid project already in progress: >

Re: Shuffle memory woes

2016-02-08 Thread Igor Berman

It's interesting to see what spark dev people will say. Corey do you have presentation available online? On 8 February 2016 at 05:16, Corey Nolet wrote: > Charles, > > Thank you for chiming in and I'm glad someone else is experiencing this > too and not just me. I know very

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu

Thanks Luciano, now it looks like I’m the only guy who have this issue. My options is narrowed down to upgrade my spark to 1.6.0, to see if this issue is gone. — Cheers, Todd Leo On Mon, Feb 8, 2016 at 2:12 PM Luciano Resende wrote: > I tried in both 1.5.0, 1.6.0 and

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu

I’ve found the trigger of my issue: if I start my spark-shell or submit by spark-submit with --conf spark.serializer=org.apache.spark.serializer.KryoSerializer, the DataFrame content goes wrong, as I described earlier. On Mon, Feb 8, 2016 at 5:42 PM SLiZn Liu wrote: >

How to see Cassandra List / Set / Map values from Spark Hive Thrift JDBC?

2016-02-08 Thread Matthew Johnson

Hi all, I have asked this question here on StackOverflow: http://stackoverflow.com/questions/35222365/spark-sql-hivethriftserver2-get-liststring-from-cassandra-in-squirrelsql But hoping I get more luck from this group. When I write a Java SparkSQL application to query a

Extract all the values from describe

2016-02-08 Thread Arunkumar Pillai

hi I have a dataframe df and i use df.decribe() to get the stats value. but not able to parse and extract all the individual information. Please help -- Thanks and Regards Arun

Re: how can i write map(x => x._1 ++ x._2) statement in python.??

2016-02-08 Thread Yuval.Itzchakov

In python, concatenating two lists can be done simply using the + operator. I'm assuming the RDD you're using map over consists of a tuple: map(lambda x: x[0] + x[1]) -- View this message in context:

Re: Kafka directsream receiving rate

2016-02-08 Thread Diwakar Dhanuskodi

Now, using DirectStream I am able to process 2 Million messages from 20 partition topic in a batch interval of 2000ms. Finally figured out that Kafka producer from a source system is sending same topic name instead of key in keyedmessage . It could put messages

[Spark 1.5.1] percentile in spark

2016-02-08 Thread Arunkumar Pillai

Hi I'm using sql query find the percentile value. Is there any pre defined functions for percentile calculation -- Thanks and Regards Arun

Long running Spark job on YARN throws "No AMRMToken"

2016-02-08 Thread Prabhu Joseph

Hi All, A long running Spark job on YARN throws below exception after running for few days. yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for user prabhu at

Re: Spark Streaming with Druid?

2016-02-08 Thread Hemant Bhanawat

SnappyData's deployment is different that how Spark is deployed. See http://snappydatainc.github.io/snappydata/deployment/ and http://snappydatainc.github.io/snappydata/jobs/. For further questions, you can join us on stackoverflow http://stackoverflow.com/questions/tagged/snappydata. Hemant

Re: Optimal way to re-partition from a single partition

2016-02-08 Thread Takeshi Yamamuro

Hi, Plz use DataFrame#repartition. On Tue, Feb 9, 2016 at 7:30 AM, Cesar Flores wrote: > > I have a data frame which I sort using orderBy function. This operation > causes my data frame to go to a single partition. After using those > results, I would like to re-partition to

Re: Please help with external package using --packages option in spark-shell

2016-02-08 Thread Jeff - Data Bean Australia

Finally I figured out the problem and fixed it. There was some inconsistency in my .ivy2 and .m2 repository. Spark resolves the dependencies using meta data in ivy2/cache by not verifies its real location. That was why Spark resolved jackson-core-asl in local-m2-cache. But when Spark tried to

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread lmk

Hi Dhimant, As I had indicated in my next mail, my problem was due to disk getting full with log messages (these were dumped into the slaves) and did not have anything to do with the content pushed into s3. So, looks like this error message is very generic and is thrown for various reasons. You

RE: different behavior while using createDataFrame and read.df in SparkR

2016-02-08 Thread Sun, Rui

I guess the problem is: dummy.df<-withColumn(dataframe,paste0(colnames(cat.column),j),ifelse(column[[1]]==levels(as.factor(unlist(cat.column)))[j],1,0) ) dataframe<-dummy.df Once dataframe is re-assigned to reference a new DataFrame in each iteration, the column variable has to be

Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-08 Thread Prabhu Joseph

+ Spark-Dev On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph wrote: > Hi All, > > A long running Spark job on YARN throws below exception after running > for few days. > > yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. >

Re: ALS rating caching

2016-02-08 Thread Nick Pentreath

In the "new" ALS intermediate RDDs (including the ratings input RDD after transforming to block-partitioned ratings) is cached using intermediateRDDStorageLevel, and you can select the final RDD storage level (for user and item factors) using finalRDDStorageLevel. The old MLLIB API now calls the

[Spark Streaming] Spark Streaming dropping last lines

2016-02-08 Thread Nipun Arora

I have a spark-streaming service, where I am processing and detecting anomalies on the basis of some offline generated model. I feed data into this service from a log file, which is streamed using the following command tail -f | nc -lk Here the spark streaming service is taking data from

Re: Extract all the values from describe

2016-02-08 Thread James Barney

Hi Arunkumar, >From the scala documentation it's recommended to use the agg function for performing any actual statistics programmatically on your data. df.describe() is meant only for data exploration. See Aggregator here:

Re: Shuffle memory woes

2016-02-08 Thread Corey Nolet

I sure do! [1] And yes- I'm really hoping they will chime in, otherwise I may dig a little deeper myself and start posting some jira tickets. [1] http://www.slideshare.net/cjnolet On Mon, Feb 8, 2016 at 3:02 AM, Igor Berman wrote: > It's interesting to see what spark dev

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread Luciano Resende

Sorry, same expected results with trunk and Kryo serializer On Mon, Feb 8, 2016 at 4:15 AM, SLiZn Liu wrote: > I’ve found the trigger of my issue: if I start my spark-shell or submit > by spark-submit with --conf >

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni

Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Example of onEnvironmentUpdate Listener

2016-02-08 Thread Ashish Soni

Are there any examples as how to implement onEnvironmentUpdate method for customer listener Thanks,

ErrorToken illegal character in a query having / @ $ . symbols

2016-02-08 Thread Mohamed Nadjib MAMI

Hello all, Could someone please help me figure out what wrong with my query that I'm running over Parquet tables? the query has the following form: weird_query = "SELECT a._example.com/aa/1.1/aa_, b._example.com/bb/1.2/bb_ FROM www$aa@aa a LEFT JOIN www$bb@bb b ON

Re: Access batch statistics in Spark Streaming

2016-02-08 Thread Bryan Jeffrey

>From within a Spark job you can use a Periodic Listener: ssc.addStreamingListener(PeriodicStatisticsListener(Seconds(60))) class PeriodicStatisticsListener(timePeriod: Duration) extends StreamingListener { private val logger = LoggerFactory.getLogger("Application") override def

Spark LBFGS Error with ANN

2016-02-08 Thread Hayri Volkan Agun

I am using Multilayer Percertron Classifier. In each training instance there are multiple 1.0 in the ouput vector of the Multilayer Perceptron Classifier. This is necessary. With small number of training data I am getting the following error *ERROR LBFGS: Failure again! Giving up and returning.

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread Eugen Cepoi

I had similar problems with multi part uploads. In my case the real error was something else which was being masked by this issue https://issues.apache.org/jira/browse/SPARK-6560. In the end this bad digest exception was a side effect and not the original issue. For me it was some library version

Spark in Production - Use Cases

2016-02-08 Thread Scott walent

Spark Summit East is just 10 days away and we are almost sold out! One of the highlights this year will focus on how Spark is being used across businesses to solve both big and small data needs. Check out the full agenda here: https://spark-summit.org/east-2016/schedule/ Use "ApacheList" for 30%

Optimal way to re-partition from a single partition

2016-02-08 Thread Cesar Flores

I have a data frame which I sort using orderBy function. This operation causes my data frame to go to a single partition. After using those results, I would like to re-partition to a larger number of partitions. Currently I am just doing: val rdd = df.rdd.coalesce(100, true) //df is a dataframe

LogisticRegressionModel not able to load serialized model from S3

2016-02-08 Thread Utkarsh Sengar

I am storing a model in s3 in this path: "bucket_name/p1/models/lr/20160204_0410PM/ser" and the structure of the saved dir looks like this: 1. bucket_name/p1/models/lr/20160204_0410PM/ser/data -> _SUCCESS, _metadata, _common_metadata and

ALS rating caching

2016-02-08 Thread Roberto Pagliari

When using ALS from mllib, would it be better/recommended to cache the ratings RDD? I'm asking because when predicting products for users (for example) it is recommended to cache product/user matrices. Thank you,

Re: Imported CSV file content isn't identical to the original file

2016-02-08 Thread SLiZn Liu

At least works for me though, temporarily disabled Kyro serilizer until upgrade to 1.6.0. Appreciate for your update. :) Luciano Resende 于2016年2月9日周二02:37写道： > Sorry, same expected results with trunk and Kryo serializer > > On Mon, Feb 8, 2016 at 4:15 AM, SLiZn Liu

Futures timed out after [120 seconds]

Re: [External] Re: Spark 1.6.0 HiveContext NPE

Access batch statistics in Spark Streaming

Re: Spark Streaming with Druid?

Re: Shuffle memory woes

Re: Imported CSV file content isn't identical to the original file

Re: Imported CSV file content isn't identical to the original file

How to see Cassandra List / Set / Map values from Spark Hive Thrift JDBC?

Extract all the values from describe

Re: how can i write map(x => x._1 ++ x._2) statement in python.??

Re: Kafka directsream receiving rate

[Spark 1.5.1] percentile in spark

Long running Spark job on YARN throws "No AMRMToken"

Re: Spark Streaming with Druid?

Re: Optimal way to re-partition from a single partition

Re: Please help with external package using --packages option in spark-shell

Re: Bad Digest error while doing aws s3 put

RE: different behavior while using createDataFrame and read.df in SparkR

Re: Long running Spark job on YARN throws "No AMRMToken"

Re: ALS rating caching

[Spark Streaming] Spark Streaming dropping last lines

Re: Extract all the values from describe

Re: Shuffle memory woes

Re: Imported CSV file content isn't identical to the original file

Dynamically Change Log Level Spark Streaming

Example of onEnvironmentUpdate Listener

ErrorToken illegal character in a query having / @ $ . symbols

Re: Access batch statistics in Spark Streaming

Spark LBFGS Error with ANN

Re: Bad Digest error while doing aws s3 put

Spark in Production - Use Cases

Optimal way to re-partition from a single partition

LogisticRegressionModel not able to load serialized model from S3

ALS rating caching

Re: Imported CSV file content isn't identical to the original file

35 matches

Site Navigation

Mail list logo

Footer information