UNSUBSCRIBE
UNSUBSCRIBE
UNSUBSCRIBE
From: Harjit Singh [mailto:harjit.si...@deciphernow.com] Sent: Tuesday, April 26, 2016 3:11 PM To: user@spark.apache.org Subject: test
unsubscribe
From: Aaron Jackson [mailto:ajack...@pobox.com] Sent: Tuesday, July 19, 2016 7:17 PM To: userSubject: Heavy Stage Concentration - Ends With Failure Hi, I have a cluster with 15 nodes of which 5 are HDFS nodes. I kick off a job that creates some 120 stages. Eventually, the active and pending stages reduce down to a small bottleneck and it never fails... the tasks associated with the 10 (or so) running tasks are always allocated to the same executor on the same host. Sooner or later, it runs out of memory ... or some other resource. It falls over and then they tasks are reallocated to another executor. Why do we see such heavy concentration of tasks onto a single executor when other executors are free? Were the tasks assigned to an executor when the job was decomposed into stages?
unsubscribe
unsubscribe From: Aaron Perrin [mailto:aper...@gravyanalytics.com] Sent: Tuesday, January 31, 2017 9:42 AM To: user @sparkSubject: Multiple quantile calculations I want to calculate quantiles on two different columns. I know that I can calculate them with two separate operations. However, for performance reasons, I'd like to calculate both with one operation. Is this possible to do this with the Dataset API? I'm assuming that it isn't. But, if it isn't, is it possible to calculate both in one pass, assuming that I made some code changes? I briefly looked at the approxQuantile code, but I haven't dug into the algorithm.
unsubscribe
unsubscribe From: Abir Chakraborty [mailto:abi...@247-inc.com] Sent: Saturday, May 20, 2017 1:29 AM To: user@spark.apache.org Subject: unsubscribe
user-unsubscr...@spark.apache.org
From: Abir Chakraborty [mailto:abi...@247-inc.com] Sent: Sunday, May 21, 2017 4:17 AM To: user@spark.apache.org Subject: unsubscribe unsubscribe
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org From: 萝卜丝炒饭 [mailto:1427357...@qq.com] Sent: Sunday, May 21, 2017 8:15 PM To: userSubject: Are tachyon and akka removed from 2.1.1 please HI all, Iread some paper about source code, the paper base on version 1.2. they refer the tachyon and akka. When i read the 2.1code. I can not find the code abiut akka and tachyon. Are tachyon and akka removed from 2.1.1 please
user-unsubscr...@spark.apache.org
From: Bibudh Lahiri [mailto:bibudhlah...@gmail.com] Sent: Sunday, May 21, 2017 9:34 AM To: userSubject: unsubscribe unsubscribe
user-unsubscr...@spark.apache.org
From: Arun [mailto:arunbm...@gmail.com] Sent: Saturday, May 20, 2017 9:48 PM To: user@spark.apache.org Subject: Rmse recomender system hi all.. I am new to machine learning. i am working on recomender system. for training dataset RMSE is 0.08 while on test data its is 2.345 whats conclusion and what steps can i take to improve Sent from Samsung tablet
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org From: ANEESH .V.V [mailto:aneeshnair.ku...@gmail.com] Sent: Friday, May 26, 2017 1:50 AM To: user@spark.apache.org Subject: unsubscribe unsubscribe
user-unsubscr...@spark.apache.org
From: Steffen Schmitz [mailto:steffenschm...@hotmail.de] Sent: Thursday, May 25, 2017 3:34 AM To: ramnavanCc: user@spark.apache.org Subject: Re: Questions regarding Jobs, Stages and Caching
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org From: 颜发才(Yan Facai) [mailto:facai@gmail.com] Sent: Wednesday, June 7, 2017 4:24 AM To: kundan kumarCc: spark users Subject: Re: Convert the feature vector to raw data Hi, kumar. How about removing the `select` in your code? namely, Dataset result = model.transform(testData); result.show(1000, false); On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar > wrote: I am using Dataset result = model.transform(testData).select("probability", "label","features"); result.show(1000, false); In this case the feature vector is being printed as output. Is there a way that my original raw data gets printed instead of the feature vector OR is there a way to reverse extract my raw data from the feature vector. All of the features that my dataset have is categorical in nature. Thanks, Kundan
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org From: kundan kumar [mailto:iitr.kun...@gmail.com] Sent: Wednesday, June 7, 2017 5:15 AM To: 颜发才(Yan Facai)Cc: spark users Subject: Re: Convert the feature vector to raw data Hi Yan, This doesnt work. thanks, kundan On Wed, Jun 7, 2017 at 2:53 PM, 颜发才(Yan Facai) > wrote: Hi, kumar. How about removing the `select` in your code? namely, Dataset result = model.transform(testData); result.show(1000, false); On Wed, Jun 7, 2017 at 5:00 PM, kundan kumar > wrote: I am using Dataset result = model.transform(testData).select("probability", "label","features"); result.show(1000, false); In this case the feature vector is being printed as output. Is there a way that my original raw data gets printed instead of the feature vector OR is there a way to reverse extract my raw data from the feature vector. All of the features that my dataset have is categorical in nature. Thanks, Kundan
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org user-unsubscr...@spark.apache.org From: kundan kumar [mailto:iitr.kun...@gmail.com] Sent: Wednesday, June 7, 2017 4:01 AM To: spark usersSubject: Convert the feature vector to raw data I am using Dataset result = model.transform(testData).select("probability", "label","features"); result.show(1000, false); In this case the feature vector is being printed as output. Is there a way that my original raw data gets printed instead of the feature vector OR is there a way to reverse extract my raw data from the feature vector. All of the features that my dataset have is categorical in nature. Thanks, Kundan
user-unsubscr...@spark.apache.org
From: Joel D [mailto:games2013@gmail.com] Sent: Monday, May 29, 2017 9:04 PM To: user@spark.apache.org Subject: Schema Evolution Parquet vs Avro Hi, We are trying to come up with the best storage format for handling schema changes in ingested data. We noticed that both avro and parquet allows one to select based on column name instead of the data index/position of data. However, we are inclined towards parquet for better read performance since it's columnar and we will be selecting few columns instead of all. Data will be processed and saved to partitions on which we will have hive external tables. Will parquet be able to handle the following: - Column renaming from between data - Column removal from between - DataType change of existing column (int to bigint should be allowed, right?) Please advise. Thanks, Sam