Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread Ashish
For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- thanks ashish Blog: http://www.ashishpaliwal.com

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-20 Thread Ashish
, then it will be re-calculated (and persisted) after (4) is calculated. On Tue, Jan 20, 2015 at 3:38 AM, Ashish paliwalash...@gmail.com wrote: Sean, A related question. When to persist the RDD after step 2 or after Step 3 (nothing would happen before step 3 I assume)? On Mon, Jan 19, 2015 at 5:17 PM, Sean

Re: UnknownhostException : home

2015-01-19 Thread Ashish
? Thanks -- thanks ashish Blog: http://www.ashishpaliwal.com/blog My Photo Galleries: http://www.pbase.com/ashishpaliwal - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: UnknownhostException : home

2015-01-19 Thread Ashish
the cause of this exception. Any help in this regard? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- thanks ashish Blog: http

Unsubscribe

2022-07-28 Thread Ashish
Unsubscribe Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Problem in Spark Streaming

2014-06-10 Thread Ashish Rangole
Have you considered the garbage collection impact and if it coincides with your latency spikes? You can enable gc logging by changing Spark configuration for your job. Hi, as I searched the keyword Total delay in the console log, the delay keeps increasing. I am not sure what does this total delay

Specifying classpath

2014-08-26 Thread Ashish Jain
spark-submit to submit the job Thanks Ashish

Re: Specifying classpath

2014-08-27 Thread Ashish Jain
I solved this issue by putting hbase-protobuf in Hadoop classpath, and not in the spark classpath. export HADOOP_CLASSPATH=/path/to/jar/hbase-protocol-0.98.1-cdh5.1.0.jar On Tue, Aug 26, 2014 at 5:42 PM, Ashish Jain ashish@gmail.com wrote: Hello, I'm using the following version

When to start optimizing for GC?

2014-09-29 Thread Ashish Jain
Hello, I have written a standalone spark job which I run through Ooyala Job Server. The program is working correctly, now I'm looking into how to optimize it. My program without optimization took 4 hours to run. The first optimization of KyroSerializer and compiling regex pattern and reusing

Re: Spark inside Eclipse

2014-10-01 Thread Ashish Jain
your program like you would any normal program. Hope this helps. Thanks Ashish On Oct 1, 2014 4:35 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com.invalid wrote: hey guys Is there a way to run Spark in local mode from within Eclipse. I am running Eclipse Kepler on a Macbook Pro

Re: Confusion over how to deploy/run JAR files to a Spark Cluster

2014-10-02 Thread Ashish Jain
Hello Mark, I am no expert but I can answer some of your questions. On Oct 2, 2014 2:15 AM, Mark Mandel mark.man...@gmail.com wrote: Hi, So I'm super confused about how to take my Spark code and actually deploy and run it on a cluster. Let's assume I'm writing in Java, and we'll take a

Re: partition size for initial read

2014-10-02 Thread Ashish Jain
If you are using textFiles() to read data in, it also takes in a parameter the number of minimum partitions to create. Would that not work for you? On Oct 2, 2014 7:00 AM, jamborta jambo...@gmail.com wrote: Hi all, I have been testing repartitioning to ensure that my algorithms get similar

Re: Spark or MR, Scala or Java?

2014-11-23 Thread Ashish Rangole
This being a very broad topic, a discussion can quickly get subjective. I'll try not to deviate from my experiences and observations to keep this thread useful to those looking for answers. I have used Hadoop MR (with Hive, MR Java apis, Cascading and Scalding) as well as Spark (since v 0.6) in

Re: Loading RDDs in a streaming fashion

2014-12-02 Thread Ashish Rangole
This is a common use case and this is how Hadoop APIs for reading data work, they return an Iterator [Your Record] instead of reading every record in at once. On Dec 1, 2014 9:43 PM, Andy Twigg andy.tw...@gmail.com wrote: You may be able to construct RDDs directly from an iterator - not sure -

Re: spark-submit on YARN is slow

2014-12-05 Thread Ashish Rangole
Likely this not the case here yet one thing to point out with Yarn parameters like --num-executors is that they should be specified *before* app jar and app args on spark-submit command line otherwise the app only gets the default number of containers which is 2. On Dec 5, 2014 12:22 PM, Sandy

Re: Why so many tasks?

2014-12-16 Thread Ashish Rangole
Take a look at combine file input format. Repartition or coalesce could introduce shuffle I/O overhead. On Dec 16, 2014 7:09 AM, bethesda swearinge...@mac.com wrote: Thank you! I had known about the small-files problem in HDFS but didn't realize that it affected sc.textFile(). -- View

Error when running example (pi.py)

2015-02-08 Thread Ashish Kumar
Traceback (most recent call last): File pi.py, line 29, in module sc = SparkContext(appName=PythonPi) File /home/ashish/Downloads/spark-1.1.0-bin-hadoop2.4/python/pyspark/context.py, line 104, in __init__ SparkContext._ensure_initialized(self, gateway=gateway) File /home/ashish

Spark Distributed Join

2015-02-13 Thread Ashish Mukherjee
, there would be intermediate disk writes and data transfer to the Spark master. 2) Don't use Spark and do the same with some in-memory distributed engine like MemSQL or Redis. What's the experts' view on this? Regards, Ashish

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Ashish Rangole
By default, the files will be created under the path provided as the argument for saveAsTextFile. This argument is considered as a folder in the bucket and actual files are created in it with the naming convention part-n, where n is the number of output partition. On Mon, Jan 26, 2015 at

ImportError: No module named pyspark, when running pi.py

2015-02-09 Thread Ashish Kumar
*Command:* sudo python ./examples/src/main/python/pi.py *Error:* Traceback (most recent call last): File ./examples/src/main/python/pi.py, line 22, in module from pyspark import SparkContext ImportError: No module named pyspark

Spark as a service

2015-03-24 Thread Ashish Mukherjee
Hello, As of now, if I have to execute a Spark job, I need to create a jar and deploy it. If I need to run a dynamically formed SQL from a Web application, is there any way of using SparkSQL in this manner? Perhaps, through a Web Service or something similar. Regards, Ashish

Re: Question about Data Sources API

2015-03-24 Thread Ashish Mukherjee
with grouping and sorting. Essentially, I am trying to evaluate if this API can give me much of what is possible with the Apache MetaModel project. Regards, Ashish On Tue, Mar 24, 2015 at 1:57 PM, Michael Armbrust mich...@databricks.com wrote: On Tue, Mar 24, 2015 at 12:57 AM, Ashish Mukherjee

Spark SQL and DataSources API roadmap

2015-03-27 Thread Ashish Mukherjee
Hello, Is there any published community roadmap for SparkSQL and the DataSources API? Regards, Ashish

SparkSQL production readiness

2015-02-28 Thread Ashish Mukherjee
Hi, I am exploring SparkSQL for my purposes of performing large relational operations across a cluster. However, it seems to be in alpha right now. Is there any indication when it would be considered production-level? I don't see any info on the site. Regards, Ashish

Running in-memory SQL on streamed relational data

2015-02-28 Thread Ashish Mukherjee
. Has anyone done this kind of thing with Spark before? i.e combination of SparkSQL with Streaming. Regards, Ashish

Re: Tools to manage workflows on Spark

2015-02-28 Thread Ashish Nigam
...@gmail.com wrote: Thanks, Ashish! Is Oozie integrated with Spark? I knew it can accommodate some Hadoop jobs. On Sat, Feb 28, 2015 at 6:07 PM, Ashish Nigam ashnigamt...@gmail.com mailto:ashnigamt...@gmail.com wrote: Qiang, Did you look at Oozie? We use oozie to run spark jobs in production

Re: Tools to manage workflows on Spark

2015-02-28 Thread Ashish Nigam
Qiang, Did you look at Oozie? We use oozie to run spark jobs in production. On Feb 28, 2015, at 2:45 PM, Qiang Cao caoqiang...@gmail.com wrote: Hi Everyone, We need to deal with workflows on Spark. In our scenario, each workflow consists of multiple processing steps. Among different

Re: Unable to find org.apache.spark.sql.catalyst.ScalaReflection class

2015-02-28 Thread Ashish Nigam
was in the classpath ? Cheers On Sat, Feb 28, 2015 at 9:18 AM, Ashish Nigam ashnigamt...@gmail.com mailto:ashnigamt...@gmail.com wrote: Hi, I wrote a very simple program in scala to convert an existing RDD to SchemaRDD. But createSchemaRDD function is throwing exception Exception in thread

Spark with data on NFS v HDFS

2015-03-05 Thread Ashish Mukherjee
to read from because of better reliability compared to NFS? Should data be stored on local FS (not NFS) only for Spark jobs which run on single machine? Regards, Ashish

Unable to find org.apache.spark.sql.catalyst.ScalaReflection class

2015-02-28 Thread Ashish Nigam
classpath [.] not found Here's more info on the versions I am using - scala.binary.version2.11/scala.binary.version spark.version1.2.1/spark.version scala.version2.11.5/scala.version Please let me know how can I resolve this problem. Thanks Ashish

Re: Unable to find org.apache.spark.sql.catalyst.ScalaReflection class

2015-02-28 Thread Ashish Nigam
Also, can scala version play any role here? I am using scala 2.11.5 but all spark packages have dependency to scala 2.11.2 Just wanted to make sure that scala version is not an issue here. On Sat, Feb 28, 2015 at 9:18 AM, Ashish Nigam ashnigamt...@gmail.com wrote: Hi, I wrote a very simple

Re: randomSplit instead of a huge map reduce ?

2015-02-20 Thread Ashish Rangole
Is there a check you can put in place to not create pairs that aren't in your set of 20M pairs? Additionally, once you have your arrays converted to pairs you can do aggregateByKey with each pair being the key. On Feb 20, 2015 1:57 PM, shlomib shl...@summerhq.com wrote: Hi, I am new to Spark

updateStateByKey and invFunction

2015-02-24 Thread Ashish Sharma
? Thanks Ashish

Re: updateStateByKey and invFunction

2015-02-24 Thread Ashish Sharma
But how will I specify my state there? On Tue, Feb 24, 2015 at 12:50 AM Arush Kharbanda ar...@sigmoidanalytics.com wrote: You can use a reduceByKeyAndWindow with your specific time window. You can specify the inverse function in reduceByKeyAndWindow. On Tue, Feb 24, 2015 at 1:36 PM, Ashish

Spark Application Hung

2015-03-24 Thread Ashish Rawat
the system as this error may not be easily reproducible but resolving such issues is critical to take our application to production. Would appreciate some quick help! Regards, Ashish

Question about Data Sources API

2015-03-24 Thread Ashish Mukherjee
source and all the data pulled into Spark for other processing? Regards, Ashish

How Spark Execute chaining vs no chaining statements

2015-06-23 Thread Ashish Soni
Hi All , What is difference between below in terms of execution to the cluster with 1 or more worker node rdd.map(...).map(...)...map(..) vs val rdd1 = rdd.map(...) val rdd2 = rdd1.map(...) val rdd3 = rdd2.map(...) Thanks, Ashish

Spark 1.4 History Server - HDP 2.2

2015-06-20 Thread Ashish Soni
, Ashish

Re: RE: Spark or Storm

2015-06-19 Thread Ashish Soni
is evaluating the framework and does not have enough time to validate all the use cases but to relay on the documentation. Ashish On Fri, Jun 19, 2015 at 7:10 AM, bit1...@163.com bit1...@163.com wrote: I think your observation is correct, you have to take care of these replayed data at your end,eg,each

Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
Hi , Is any one able to install Spark 1.4 on HDP 2.2 , Please let me know how can i do the same ? Ashish

Re: Spark 1.4 on HortonWork HDP 2.2

2015-06-19 Thread Ashish Soni
I do not where to start as Spark 1.2 comes bundled with HDP2.2 but i want to use 1.4 and i do not know how to update it to 1.4 Ashish On Fri, Jun 19, 2015 at 8:26 AM, ayan guha guha.a...@gmail.com wrote: what problem are you facing? are you trying to build it yurself or gettingpre-built

Spark and HDFS ( Worker and Data Nodes Combination )

2015-06-22 Thread Ashish Soni
Hi All , What is the Best Way to install and Spark Cluster along side with Hadoop Cluster , Any recommendation for below deployment topology will be a great help *Also Is it necessary to put the Spark Worker on DataNodes as when it read block from HDFS it will be local to the Server / Worker or

Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
Hi , If i have a below data format , how can i use kafka direct stream to de-serialize as i am not able to understand all the parameter i need to pass , Can some one explain what will be the arguments as i am not clear about this JavaPairInputDStream

Re: Kafka Direct Stream - Custom Serialization and Deserilization

2015-06-26 Thread Ashish Soni
my question is why there are similar two parameter String.Class and StringDecoder.class what is the difference each of them ? Ashish On Fri, Jun 26, 2015 at 8:53 AM, Akhil Das ak...@sigmoidanalytics.com wrote: ​JavaPairInputDStreamString, String messages = KafkaUtils.createDirectStream

spark streaming job fails to restart after checkpointing due to DStream initialization errors

2015-06-26 Thread Ashish Nigam
to resolve this issue? Thanks Ashish

Spark on Yarn - How to configure

2015-06-19 Thread Ashish Soni
which files needs to changed to make sure my master node is SparkMaster and slave nodes are 1,2,3 and how to tell / configure Yarn Ashish

WorkFlow Processing - Spark

2015-06-24 Thread Ashish Soni
will define there own logic like custom code which we need to load inside a driver program ... Any idea the best way to do this ... Ashish

Re: spark streaming job fails to restart after checkpointing due to DStream initialization errors

2015-06-26 Thread Ashish Nigam
. Post your code if you can't get it figured out. On Fri, Jun 26, 2015 at 3:45 PM, Ashish Nigam ashnigamt...@gmail.com wrote: I bring up spark streaming job that uses Kafka as input source. No data to process and then shut it down. And bring it back again. This time job does not start

spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-10 Thread Ashish Nigam
) at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:561) at Any idea on how to fix this issue? Thanks Ashish

Re: spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-10 Thread Ashish Nigam
, 2015 at 9:18 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Delete the checkpoint directory, you might have modified your driver program. Thanks Best Regards On Wed, Jun 10, 2015 at 9:44 PM, Ashish Nigam ashnigamt...@gmail.com wrote: Hi, If checkpoint data is already present in HDFS

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
spark VS storm ( spark streaming ) and i am not able to see what is equivalent of Bolt in storm inside spark. Any help will be appreciated on this ? Thanks , Ashish - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Twitter Heron: Stream Processing at Scale - Does Spark Address all the issues

2015-06-17 Thread Ashish Soni
Hi Sparkers , https://dl.acm.org/citation.cfm?id=2742788 Recently Twitter release a paper on Heron as an replacement of Apache Storm and i would like to know if currently Apache Spark Does Suffer from the same issues as they have outlined. Any input / thought will be helpful. Thanks, Ashish

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
Stream can also be processed in micro-batch / batches which is the main reason behind Spark Steaming so what is the difference ? Ashish On Wed, Jun 17, 2015 at 9:04 AM, Enno Shioji eshi...@gmail.com wrote: PS just to elaborate on my first sentence, the reason Spark (not streaming) can offer

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
As per my Best Understanding Spark Streaming offer Exactly once processing , is this achieve only through updateStateByKey or there is another way to do the same. Ashish On Wed, Jun 17, 2015 at 8:48 AM, Enno Shioji eshi...@gmail.com wrote: In that case I assume you need exactly once semantics

Re: Spark or Storm

2015-06-17 Thread Ashish Soni
would definitely pursue this route as our transformations are really simple. Best On Wed, Jun 17, 2015 at 10:26 PM, Ashish Soni asoni.le...@gmail.com wrote: My Use case is below We are going to receive lot of event as stream ( basically Kafka Stream ) and then we need to process

Re: Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee
to a clustered scenario, which is the right engine at various degrees of scale? Regards, Ashish On Fri, May 29, 2015 at 6:57 AM, Mohit Jaggi mohitja...@gmail.com wrote: I have used VoltDB and Spark. The use cases for the two are quite different. VoltDB is intended for transactions and also

RDD staleness

2015-05-31 Thread Ashish Mukherjee
Hello, Since RDDs are created from data from Hive tables or HDFS, how do we ensure they are invalidated when the source data is updated? Regards, Ashish

Spark SQL v MemSQL/Voltdb

2015-05-28 Thread Ashish Mukherjee
Hello, I was wondering if there is any documented comparison of SparkSQL with MemSQL/VoltDB kind of in-memory SQL databases. MemSQL etc. too allow queries to be run in a clustered environment. What is the major differentiation? Regards, Ashish

BroadCast Multiple DataFrame ( JDBC Tables )

2015-07-01 Thread Ashish Soni
Hi , I need to load 10 tables in memory and have them available to all the workers , Please let me me know what is the best way to do broadcast them sc.broadcast(df) allow only one Thanks,

Convert CSV lines to List of Objects

2015-07-01 Thread Ashish Soni
Hi , How can i use Map function in java to convert all the lines of csv file into a list of objects , Can some one please help... JavaRDDListCharge rdd = sc.textFile(data.csv).map(new FunctionString, ListCharge() { @Override public ListCharge call(String s) {

Spark SQL and Streaming - How to execute JDBC Query only once

2015-07-02 Thread Ashish Soni
Hi All , I have and Stream of Event coming in and i want to fetch some additional data from the database based on the values in the incoming data , For Eg below is the data coming in loginName Email address city Now for each login name i need to go to oracle database and get the userId from

Spark-Submit / Spark-Shell Error Standalone cluster

2015-06-27 Thread Ashish Soni
Not sure what is the issue but when i run the spark-submit or spark-shell i am getting below error /usr/bin/spark-class: line 24: /usr/bin/load-spark-env.sh: No such file or directory Can some one please help Thanks,

Load Multiple DB Table - Spark SQL

2015-06-29 Thread Ashish Soni
(dbtable1,(select * from test1);options.put(dbtable2,(select * from test2);* DataFrame jdbcDF = sqlContext.load(jdbc, options); Thanks, Ashish

How Will Spark Execute below Code - Driver and Executors

2015-07-06 Thread Ashish Soni
Hi All , If some one can help me understand as which portion of the code gets executed on Driver and which portion will be executed on executor from the below code it would be a great help I have to load data from 10 Tables and then use that data in various manipulation and i am using SPARK SQL

DataFrame Find/Filter Based on Input - Inside Map function

2015-07-01 Thread Ashish Soni
Hi All , I have an DataFrame Created as below options.put(dbtable, (select * from user) as account); DataFrame accountRdd = sqlContext.read().format(jdbc).options(options).load(); and i have another RDD which contains login name and i want to find the userid from above DF RDD and return

Re: DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
Thanks , So if i load some static data from database and then i need to use than in my map function to filter records what will be the best way to do it, Ashish On Wed, Jul 1, 2015 at 10:45 PM, Raghavendra Pandey raghavendra.pan...@gmail.com wrote: You cannot refer to one rdd inside another

JVM is not ready after 10 seconds.

2015-07-06 Thread Ashish Dutt
get the following error. sc=sparkR.init(master=spark://10.229.200.250:7377) Launching java with spark-submit command C:\spark-1.4.0\bin/bin/spark-submit.cmd sparkr-shell C:\Users\ASHISH~1\AppData\Local\Temp\Rtmp82kCxH\backend_port4281739d85 Error in sparkR.init(master = spark://10.229.200.250:7377

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
Hello Shivaram, Thank you for your response. Being a novice at this stage can you also tell how to configure or set the execute permission for the spark-submit file? Thank you for your time. Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:21 AM, Shivaram Venkataraman shiva

Re: JVM is not ready after 10 seconds

2015-07-06 Thread Ashish Dutt
# spark.driver.memory 5g # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers=one two three Sincerely, Ashish Dutt On Tue, Jul 7, 2015 at 9:30 AM, Ashish Dutt ashish.du...@gmail.com wrote: Hello Shivaram, Thank you for your response. Being a novice at this stage

DataFrame Filter Inside Another Data Frame Map

2015-07-01 Thread Ashish Soni
Hi All , I am not sure what is the wrong with below code as it give below error when i access inside the map but it works outside JavaRDDCharge rdd2 = rdd.map(new FunctionCharge, Charge() { @Override public Charge call(Charge ch) throws Exception { *

Re: spark streaming - checkpointing - looking at old application directory and failure to start streaming context

2015-06-11 Thread Ashish Nigam
Any idea why this happens? On Wed, Jun 10, 2015 at 9:28 AM, Ashish Nigam ashnigamt...@gmail.com wrote: BTW, I am using spark streaming 1.2.0 version. On Wed, Jun 10, 2015 at 9:26 AM, Ashish Nigam ashnigamt...@gmail.com wrote: I did not change driver program. I just shutdown the context

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote: Could you periodically

Java Streaming Context - File Stream use

2015-08-10 Thread Ashish Soni
Please help as not sure what is incorrect with below code as it gives me complilaton error in eclipse SparkConf sparkConf = new SparkConf().setMaster(local[4]).setAppName(JavaDirectKafkaWordCount); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,

PySpark in Pycharm- unable to connect to remote server

2015-08-05 Thread Ashish Dutt
) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File C:/Users/ashish dutt/PycharmProjects

How to connect to remote HDFS programmatically to retrieve data, analyse it and then write the data back to HDFS?

2015-08-05 Thread Ashish Dutt
) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File C:/Users/ashish dutt/PycharmProjects

Class Loading Issue - Spark Assembly and Application Provided

2015-07-21 Thread Ashish Soni
Hi All , I am having a class loading issue as Spark Assembly is using google guice internally and one of Jar i am using uses sisu-guice-3.1.0-no_aop.jar , How do i load my class first so that it doesn't result in error and tell spark to load its assembly later on Ashish

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Ashish Mukherjee
MySQL and PgSQL scale to millions. Spark or any distributed/clustered computing environment would be inefficient for the kind of data size you mention. That's because of coordination of processes, moving data around etc. On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri sand...@knowbigdata.com wrote:

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Ashish Dutt
()? where is it in windows environment Thanks for your help Sincerely, Ashish Dutt On Mon, Jul 13, 2015 at 3:48 PM, Sun, Rui rui@intel.com wrote: Hi, Kachau, If you are using SparkR with RStudio, have you followed the guidelines in the section Using SparkR from RStudio in https

Re: Is it possible to change the default port number 7077 for spark?

2015-07-13 Thread Ashish Dutt
Hello Arun, Thank you for the descriptive response. And thank you for providing the sample file too. It certainly is a great help. Sincerely, Ashish On Mon, Jul 13, 2015 at 10:30 PM, Arun Verma arun.verma...@gmail.com wrote: PFA sample file On Mon, Jul 13, 2015 at 7:37 PM, Arun Verma

BroadCast on Interval ( eg every 10 min )

2015-07-16 Thread Ashish Soni
Hi All , How can i broadcast a data change to all the executor ever other 10 min or 1 min Ashish

XML Parsing

2015-07-19 Thread Ashish Soni
Hi All , I have an XML file with same tag repeated multiple times as below , Please suggest what would be best way to process this data inside spark as ... How can i extract each open and closing tag and process them or how can i combine multiple line into single line review /review review

Spark 1.5.1+Hadoop2.6 .. unable to write to S3 (HADOOP-12420)

2015-10-22 Thread Ashish Shrowty
://issues.apache.org/jira/browse/HADOOP-12420) My question is - what are people doing today to access S3? I am unable to find an older JAR of the AWS SDK to test with. Thanks, Ashish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-Hadoop2-6-unable

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
/6384491/00-Setup-IPython-PySpark.ipynb Thanks, Ashish On Wed, Jul 8, 2015 at 5:49 PM, sooraj soora...@gmail.com wrote: That turned out to be a silly data type mistake. At one point in the iterative call, I was passing an integer value for the parameter 'alpha' of the ALS train API, which

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Have a look http://alvinalexander.com/scala/how

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
and hence not much help to me. I am able to launch ipython on localhost but cannot get it to work on the cluster Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 5:49 PM, sooraj soora...@gmail.com wrote: That turned out to be a silly data type mistake. At one point in the iterative call, I

How to upgrade Spark version in CDH 5.4

2015-07-08 Thread Ashish Dutt
--7dc6__section_zd5_1yz_l4 but I do not see any thing relevant Any suggestions directing to a solution are welcome. Thanks, Ashish

Re: Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Ashish Dutt
Hello Prateek, I started with getting the pre built binaries so as to skip the hassle of building them from scratch. I am not familiar with scala so can't comment on it. I have documented my experiences on my blog www.edumine.wordpress.com Perhaps it might be useful to you. On 08-Jul-2015 9:39

Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
Sincerely, Ashish Dutt

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Whats the point of creating them in parallel? You can multi-thread it run it in parallel though. Thanks Best Regards On Wed, Jul 8

Re: Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
The error is JVM has not responded after 10 seconds. On 08-Jul-2015 10:54 PM, ayan guha guha.a...@gmail.com wrote: What's the error you are getting? On 9 Jul 2015 00:01, Ashish Dutt ashish.du...@gmail.com wrote: Hi, We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two

How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
to the master? Thanks, Ashish

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
Thank you Ayan for your response.. But I have just realised that the Spark is configured to be a history server. Please, can somebody suggest to me how can I convert Spark history server to be a Master server? Thank you Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 12:28 PM, ayan guha guha.a

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/08 11:28:35 INFO SecurityManager: Changing view acls to: Ashish Dutt 15/07/08 11:28:35 INFO

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
. All I want for now is how to connect my laptop to the spark cluster machine using either pyspark or SparkR. (I have python 2.7) On my laptop I am using winutils in place of hadoop and have spark 1.4 installed Thank you Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University

Re: Connecting to nodes on cluster

2015-07-09 Thread Ashish Dutt
Hello Akhil, Thanks for the response. I will have to figure this out. Sincerely, Ashish On Thu, Jul 9, 2015 at 3:40 PM, Akhil Das ak...@sigmoidanalytics.com wrote: On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt ashish.du...@gmail.com wrote: Hi, We have a cluster with 4 nodes. The cluster

DLL load failed: %1 is not a valid win32 application on invoking pyspark

2015-07-08 Thread Ashish Dutt
. Sincerely, Ashish Dutt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
written something wrong here. Cannot seem to figure out, what is it? Thank you for your help Sincerely, Ashish Dutt On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal sujitatgt...@gmail.com wrote: Hi Ashish, Nice post. Agreed, kudos to the author of the post, Benjamin Benfort of District Labs

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like Visual VM to see which object(s) are taking up heap space. It is easy to do. We did this and found out that in our case it was the data structure that stores info about stages, jobs and tasks. There can

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
Do you think I should create a JIRA? On Sun, Aug 30, 2015 at 12:56 PM Ted Yu yuzhih...@gmail.com wrote: I got StackOverFlowError as well :-( On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty ashish.shro...@gmail.com wrote: Yep .. I tried that too earlier. Doesn't make a difference. Are you

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
#broadcast-variables Cheers On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty ashish.shro...@gmail.com wrote: @Sean - Agree that there is no action, but I still get the stackoverflowerror, its very weird @Ted - Variable a is just an int - val a = 10 ... The error happens when I try to pass

  1   2   >