Unsubscribe

2022-07-28 Thread Ashish
Unsubscribe Sent from my iPhone - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Problem of how to retrieve file from HDFS

2019-10-08 Thread Ashish Mittal
te().save("hdfs://localhost:9000/user/hadoop/inpit/data/history.csv"); This code is successfully store csv file. but i don't know how to retrieve csv file from hdfs. Please help me. Thanks & Regards, Ashish Mittal

Re: Spark Streaming to REST API

2017-12-21 Thread ashish rawat
Sorry, for not making it explicit. We are using Spark Streaming as the streaming solution and I was wondering if it is a common pattern to do per tuple redis read/write and write to a REST API through Spark Streaming. Regards, Ashish On Fri, Dec 22, 2017 at 4:00 AM, Gourav Sengupta <gourav.se

Spark Streaming to REST API

2017-12-21 Thread ashish rawat
into redis. Also, we need to write the final out to a system through REST API (the system doesn't provide any other mechanism to write). Is it a common pattern to read/write to db per tuple? Also, are there any connectors to write to REST endpoints. Regards, Ashish

Re: NLTK with Spark Streaming

2017-12-01 Thread ashish rawat
r Hakobian, Ph.D. Staff Data Scientist Rally Health nicholas.hakob...@rallyhealth.com On Sun, Nov 26, 2017 at 8:19 AM, ashish rawat <dceash...@gmail.com> wrote: > Thanks Holden and Chetan. > > Holden - Have you tried it out, do you know the right way to do it? > Chetan - yes, if we

Re: NLTK with Spark Streaming

2017-11-26 Thread ashish rawat
at 3:31 PM, Holden Karau <hol...@pigscanfly.ca> > wrote: > >> So it’s certainly doable (it’s not super easy mind you), but until the >> arrow udf release goes out it will be rather slow. >> >> On Sun, Nov 26, 2017 at 8:01 AM ashish rawat <dceash...@gmail.com&

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
flexibility. Regards, Ashish

Re: Spark based Data Warehouse

2017-11-17 Thread ashish rawat
Thanks everyone for their suggestions. Does any of you take care of auto scale up and down of your underlying spark clusters on AWS? On Nov 14, 2017 10:46 AM, "lucas.g...@gmail.com" <lucas.g...@gmail.com> wrote: Hi Ashish, bear in mind that EMR has some additional tooling availab

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
v 11, 2017 at 11:21 PM ashish rawat <dceash...@gmail.com> wrote: > Hello Everyone, > > I was trying to understand if anyone here has tried a data warehouse > solution using S3 and Spark SQL. Out of multiple possible options > (redshift, presto, hive etc), we were planning to go

Re: Spark based Data Warehouse

2017-11-13 Thread ashish rawat
? If one user fires a big query, then would that choke all other queries in the cluster? Regards, Ashish On Mon, Nov 13, 2017 at 3:10 AM, Patrick Alwell <palw...@hortonworks.com> wrote: > Alcon, > > > > You can most certainly do this. I’ve done benchmarking with Spark SQL and >

Re: Spark based Data Warehouse

2017-11-12 Thread ashish rawat
, I might be wrong but not all functionality of spark is spill to disk. So it still doesn't provide DB like reliability in execution. In case of DBs, queries get slow but they don't fail or go out of memory, specifically in concurrent user scenarios. Regards, Ashish On Nov 12, 2017 3:02 PM

Spark based Data Warehouse

2017-11-11 Thread ashish rawat
? Considering Spark still does not provide spill to disk, in many scenarios, are there frequent query failures when executing concurrent queries 4. Are there any open source implementations, which provide something similar? Regards, Ashish

Re: Azure Event Hub with Pyspark

2017-04-20 Thread Ashish Singh
Hi , You can try https://github.com/hdinsight/spark-eventhubs : which is eventhub receiver for spark streaming We are using it but you have scala version only i guess Thanks, Ashish Singh On Fri, Apr 21, 2017 at 9:19 AM, ayan guha <guha.a...@gmail.com> wrote: > [image: Boxb

Spark 2.0 issue

2016-09-29 Thread Ashish Shrowty
JIRA too .. SPARK-17709 <https://issues.apache.org/jira/browse/SPARK-17709> Any help appreciated! Thanks, Ashish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-2-0-issue-tp27818.html Sent from the Apache Spark User List mailing list archi

Spark can't connect to secure phoenix

2016-09-16 Thread Ashish Gupta
Hi All, I am running a spark program on secured cluster which creates SqlContext for creating dataframe over phoenix table. When I run my program in local mode with --master option set to local[2] my program works completely fine, however when I try to run same program with master option set

Returning DataFrame as Scala method return type

2016-09-08 Thread Ashish Tadose
driver will cause all data get passed to the driver code or it would be return just pointer to the DF? Thanks, Ashish

Logstash to collect Spark logs

2016-05-20 Thread Ashish Kumar Singh
We are trying to collect Spark logs using logstash for parsing app logs and collecting useful info. We can read the Nodemanager logs but unable to read Spark application logs using Logstash . Current Setup for Spark logs and Logstash 1- Spark runs on Yarn . 2- Using log4j socketAppenders to

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Ashish Dubey
Is there any reason you dont want to convert this - i dont think join b/w RDD and DF is supported. On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon wrote: > Hi, > > I have a RDD built during a spark streaming job and I'd like to join it to > a DataFrame (E/S input) to

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-08 Thread Ashish Dubey
fic configurations I set: > -- > spark.sql.parquet.filterPushdown: true > spark.sql.parquet.mergeSchema: true > > Thanks, > J. > > On Sat, May 7, 2016 at 4:20 PM, Ashish Dubey <ashish@gmail.com> wrote: > >> How big is your file and can you also share the code snippet >> &

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
access to a cache block. > On May 8, 2016 5:55 PM, "Ashish Dubey" <ashish@gmail.com> wrote: > > Brandon, > > how much memory are you giving to your executors - did you check if there > were dead executors in your application logs.. Most likely you require > hig

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White <bwwintheho...@gmail.com> wrote: > Hello all,

Re: Parse Json in Spark

2016-05-08 Thread Ashish Dubey
This limit is due to underlying inputFormat implementation. you can always write your own inputFormat and then use spark newAPIHadoopFile api to pass your inputFormat class path. You will have to place the jar file in /lib location on all the nodes.. Ashish On Sun, May 8, 2016 at 4:02 PM

Re: How to verify if spark is using kryo serializer for shuffle

2016-05-07 Thread Ashish Dubey
your driver heap size and application structure ( num of stages and tasks ) Ashish On Saturday, May 7, 2016, Nirav Patel <npa...@xactlycorp.com> wrote: > Right but this logs from spark driver and spark driver seems to use Akka. > > ERROR [sparkDriver-akka.actor.defaul

Re: sqlCtx.read.parquet yields lots of small tasks

2016-05-07 Thread Ashish Dubey
How big is your file and can you also share the code snippet On Saturday, May 7, 2016, Johnny W. wrote: > hi spark-user, > > I am using Spark 1.6.0. When I call sqlCtx.read.parquet to create a > dataframe from a parquet data source with a single parquet file, it yields > a

Re: Spark for Log Analytics

2016-03-31 Thread ashish rawat
ache/Nginx/Mongo etc) to Kafka, what could be the ideal strategy? Regards, Ashish On Thu, Mar 31, 2016 at 5:16 PM, Chris Fregly <ch...@fregly.com> wrote: > oh, and I forgot to mention Kafka Streams which has been heavily talked > about the last few days at Strata here in San Jose. &g

Spark for Log Analytics

2016-03-31 Thread ashish rawat
for the complex use cases, while logstash filters can be used for the simpler use cases. I was wondering if someone has already done this evaluation and could provide me some pointers on how/if to create this pipeline with Spark. Regards, Ashish

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
When you say driver running on mesos can you explain how are you doing that...?? > On Mar 10, 2016, at 4:44 PM, Eran Chinthaka Withana > wrote: > > Yanling I'm already running the driver on mesos (through docker). FYI, I'm > running this on cluster mode with

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
Hi Tim , Can you please share your dockerfiles and configuration as it will help a lot , I am planing to publish a blog post on the same . Ashish On Thu, Mar 10, 2016 at 10:34 AM, Timothy Chen <t...@mesosphere.io> wrote: > No you don't need to install spark on each slave, we have bee

Re: Problem mixing MESOS Cluster Mode and Docker task execution

2016-03-10 Thread Ashish Soni
You need to install spark on each mesos slave and then while starting container make a workdir to your spark home so that it can find the spark class. Ashish > On Mar 10, 2016, at 5:22 AM, Guillaume Eynard Bontemps > <g.eynard.bonte...@gmail.com> wrote: > > For an answer

Re: Spark 1.5 on Mesos

2016-03-04 Thread Ashish Soni
since the slave is in a > chroot. > > Can you try mounting in a volume from the host when you launch the slave > for your slave's workdir? > docker run -v /tmp/mesos/slave:/tmp/mesos/slave mesos_image mesos-slave > --work_dir=/tmp/mesos/slave > > Tim > > On Thu, Mar 3, 2016 a

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
On Wed, Mar 2, 2016 at 5:49 PM, Charles Allen <charles.al...@metamarkets.com > wrote: > @Tim yes, this is asking about 1.5 though > > On Wed, Mar 2, 2016 at 2:35 PM Tim Chen <t...@mesosphere.io> wrote: > >> Hi Charles, >> >> I thought that's fixed wi

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
I have no luck and i would to ask the question to spark committers will this be ever designed to run on mesos ? spark app as a docker container not working at all on mesos ,if any one would like the code i can send it over to have a look. Ashish On Wed, Mar 2, 2016 at 12:23 PM, Sathish Kumaran

Re: Spark 1.5 on Mesos

2016-03-02 Thread Ashish Soni
utor log from > steer file and see what the problem is? > > Tim > > On Mar 1, 2016, at 8:05 AM, Ashish Soni <asoni.le...@gmail.com> wrote: > > Not sure what is the issue but i am getting below error when i try to run > spark PI example > > Blacklisting Mesos slave value: &

Spark Submit using Convert to Marthon REST API

2016-03-01 Thread Ashish Soni
Hi All , Can some one please help me how do i translate below spark submit to marathon JSON request docker run -it --rm -e SPARK_MASTER="mesos://10.0.2.15:5050" -e SPARK_IMAGE="spark_driver:latest" spark_driver:latest /opt/spark/bin/spark-submit --name "PI Example" --class

Re: Spark 1.5 on Mesos

2016-03-01 Thread Ashish Soni
k your Mesos UI if you see Spark application in the > Frameworks tab > > On Mon, Feb 29, 2016 at 12:23 PM Ashish Soni <asoni.le...@gmail.com> > wrote: > >> What is the Best practice , I have everything running as docker container >> in single host ( mesos and marathon

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
mesosphere/spark:1.6) > and Mesos will automatically launch docker containers for you. > > Tim > > On Mon, Feb 29, 2016 at 7:36 AM, Ashish Soni <asoni.le...@gmail.com> > wrote: > >> Yes i read that and not much details here. >> >> Is it true that we nee

Re: Spark 1.5 on Mesos

2016-02-29 Thread Ashish Soni
Yes i read that and not much details here. Is it true that we need to have spark installed on each mesos docker container ( master and slave ) ... Ashish On Fri, Feb 26, 2016 at 2:14 PM, Tim Chen <t...@mesosphere.io> wrote: > https://spark.apache.org/docs/latest/running-on-mesos.ht

Spark 1.5 on Mesos

2016-02-26 Thread Ashish Soni
Hi All , Is there any proper documentation as how to run spark on mesos , I am trying from the last few days and not able to make it work. Please help Ashish

SPARK-9559

2016-02-18 Thread Ashish Soni
Hi All , Just wanted to know if there is any work around or resolution for below issue in Stand alone mode https://issues.apache.org/jira/browse/SPARK-9559 Ashish

Seperate Log4j.xml for Spark and Application JAR ( Application vs Spark )

2016-02-12 Thread Ashish Soni
Hi All , As per my best understanding we can have only one log4j for both spark and application as which ever comes first in the classpath takes precedence , Is there any way we can keep one in application and one in the spark conf folder .. is it possible ? Thanks

Re: Spark Submit

2016-02-12 Thread Ashish Soni
; spark-submit --conf "spark.executor.memory=512m" --conf > "spark.executor.extraJavaOptions=x" --conf "Dlog4j.configuration=log4j.xml" > > Sent from Samsung Mobile. > > > Original message > From: Ted Yu <yuzhih...@gmail.com&g

Spark Submit

2016-02-12 Thread Ashish Soni
Hi All , How do i pass multiple configuration parameter while spark submit Please help i am trying as below spark-submit --conf "spark.executor.memory=512m spark.executor.extraJavaOptions=-Dlog4j.configuration=log4j.xml" Thanks,

Dynamically Change Log Level Spark Streaming

2016-02-08 Thread Ashish Soni
Hi All , How do change the log level for the running spark streaming Job , Any help will be appriciated. Thanks,

Example of onEnvironmentUpdate Listener

2016-02-08 Thread Ashish Soni
Are there any examples as how to implement onEnvironmentUpdate method for customer listener Thanks,

Redirect Spark Logs to Kafka

2016-02-01 Thread Ashish Soni
Hi All , Please let me know how we can redirect spark logging files or tell spark to log to kafka queue instead of files .. Ashish

Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
ashMap<TopicAndPartition, Long>(); fromOffsets.put(new TopicAndPartition(driverArgs.inputTopic, 0), 0L); Thanks, Ashish

Re: Determine Topic MetaData Spark Streaming Job

2016-01-25 Thread Ashish Soni
, what is the correct approach. Ashish On Mon, Jan 25, 2016 at 11:38 AM, Gerard Maas <gerard.m...@gmail.com> wrote: > What are you trying to achieve? > > Looks like you want to provide offsets but you're not managing them > and I'm assuming you're using the direct stream approach

How to change the no of cores assigned for a Submitted Job

2016-01-12 Thread Ashish Soni
Hi , I have a strange behavior when i creating standalone spark container using docker Not sure why by default it is assigning 4 cores to the first Job it submit and then all the other jobs are in wait state , Please suggest if there is an setting to change this i tried --executor-cores 1 but

Deployment and performance related queries for Spark and Cassandra

2015-12-21 Thread Ashish Gadkari
any performance related parameters in Spark, Cassandra, Solr which will reduce the job time Any help to increase the performance will be appreciated. Thanks -- Ashish Gadkari

Discover SparkUI port for spark streaming job running in cluster mode

2015-12-14 Thread Ashish Nigam
:50571 INFO util.Utils: Successfully started service 'SparkUI' on port 50571. INFO ui.SparkUI: Started SparkUI at http://xxx:50571 Is there any way to know about the UI port automatically using some API? Thanks Ashish

Re: Save GraphX to disk

2015-11-20 Thread Ashish Rawat
Hi Todd, Could you please provide an example of doing this. Mazerunner seems to be doing something similar with Neo4j but it goes via hdfs and updates only the graph properties. Is there a direct way to do this with Neo4j or Titan? Regards, Ashish From: SLiZn Liu <sliznmail...@gmail.

Spark 1.5.1+Hadoop2.6 .. unable to write to S3 (HADOOP-12420)

2015-10-22 Thread Ashish Shrowty
://issues.apache.org/jira/browse/HADOOP-12420) My question is - what are people doing today to access S3? I am unable to find an older JAR of the AWS SDK to test with. Thanks, Ashish -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-5-1-Hadoop2-6-unable

Re: question on make multiple external calls within each partition

2015-10-05 Thread Ashish Soni
Need more details but you might want to filter the data first ( create multiple RDD) and then process. > On Oct 5, 2015, at 8:35 PM, Chen Song wrote: > > We have a use case with the following design in Spark Streaming. > > Within each batch, > * data is read and

Re: DStream Transformation to save JSON in Cassandra 2.1

2015-10-05 Thread Ashish Soni
try this You can use dstream.map to conver it to JavaDstream with only the data you are interested probably return an Pojo of your JSON and then call foreachRDD and inside that call below line javaFunctions(rdd).writerBuilder("table", "keyspace", mapToRow(Class.class)).saveToCassandra(); On

Re: automatic start of streaming job on failure on YARN

2015-10-02 Thread Ashish Rangole
Are you running the job in yarn cluster mode? On Oct 1, 2015 6:30 AM, "Jeetendra Gangele" wrote: > We've a streaming application running on yarn and we would like to ensure > that is up running 24/7. > > Is there a way to tell yarn to automatically restart a specific >

Re: Spark Streaming Log4j Inside Eclipse

2015-09-29 Thread Ashish Soni
I am using Java Streaming context and it doesnt have method setLogLevel and also i have tried by passing VM argument in eclipse and it doesnt work JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2)); Ashish On Tue, Sep 29, 2015 at 7:23 AM, Adrian Tanase <a

Re: Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
I am not running it using spark submit , i am running locally inside Eclipse IDE , how i set this using JAVA Code Ashish On Mon, Sep 28, 2015 at 10:42 AM, Adrian Tanase <atan...@adobe.com> wrote: > You also need to provide it as parameter to spark submit > > http://stackoverflo

Spark Streaming Log4j Inside Eclipse

2015-09-28 Thread Ashish Soni
at DEBUG or WARN Ashish

Spark Streaming and Kafka MultiNode Setup - Data Locality

2015-09-21 Thread Ashish Soni
Hi All , Just wanted to find out if there is an benefits to installing kafka brokers and spark nodes on the same machine ? is it possible that spark can pull data from kafka if it is local to the node i.e. the broker or partition is on the same machine. Thanks, Ashish

Spark Cassandra Filtering

2015-09-16 Thread Ashish Soni
Hi , How can i pass an dynamic value inside below function to filter instead of hardcoded if have an existing RDD and i would like to use data in that for filter so instead of doing .where("name=?","Anna") i want to do .where("name=?",someobject.value) Please help JavaRDD rdd3 =

Dynamic Workflow Execution using Spark

2015-09-15 Thread Ashish Soni
Hi All , Are there any framework which can be used to execute workflows with in spark or Is it possible to use ML Pipeline for workflow execution but not doing ML . Thanks, Ashish

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
{ return -1; } else { return 1; } } } ... Thanks, Ashish On Wed, Sep 9, 2015 at 5:13 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which release of Spark are you using ? > > Can you show skeleton of your partitioner and comparator ? > > Thanks >

Re: ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-10 Thread Ashish Shenoy
Yup thanks Ted. My getPartition() method had a bug where a signed int was being moduloed with the number of partitions. Fixed that. Thanks, Ashish On Thu, Sep 10, 2015 at 10:44 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Here is snippet of ExternalSorter.scala where ArrayIndexOutOfBounds

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-09 Thread Ashish Dutt
Dear Sasha, What I did was that I installed the parcels on all the nodes of the cluster. Typically the location was /opt/cloudera/parcels/CDH5.4.2-1.cdh5.4.2.p0.2 Hope this helps you. With regards, Ashish On Tue, Sep 8, 2015 at 10:18 PM, Sasha Kacanski <skacan...@gmail.com> wrote:

ArrayIndexOutOfBoundsException when using repartitionAndSortWithinPartitions()

2015-09-09 Thread Ashish Shenoy
ark's code and not my application code. Can you pls point out what I am doing wrong ? Thanks, Ashish

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-07 Thread Ashish Dutt
orker, it works too. I am not sure if this will help or not for your use-case. Sincerely, Ashish On Mon, Sep 7, 2015 at 11:04 PM, Sasha Kacanski <skacan...@gmail.com> wrote: > Thanks Ashish, > nice blog but does not cover my issue. Actually I have pycharm running and > loading

Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-06 Thread Ashish Dutt
flow <http://stackoverflow.com/search?q=no+module+named+pyspark> website Sincerely, Ashish Dutt On Mon, Sep 7, 2015 at 7:17 AM, Sasha Kacanski <skacan...@gmail.com> wrote: > Hi, > I am successfully running python app via pyCharm in local mode > setMaster("local[*]")

Re: FlatMap Explanation

2015-09-03 Thread Ashish Soni
Thanks a lot everyone. Very Helpful. Ashish On Thu, Sep 3, 2015 at 2:19 AM, Zalzberg, Idan (Agoda) < idan.zalzb...@agoda.com> wrote: > Hi, > > Yes, I can explain > > > > 1 to 3 -> 1,2,3 > > 2 to 3- > 2,3 > > 3 to 3 -> 3 > > 3 to 3 -> 3 &

FlatMap Explanation

2015-09-02 Thread Ashish Soni
Hi , Can some one please explain the output of the flat map data in RDD as below {1, 2, 3, 3} rdd.flatMap(x => x.to(3)) output as below {1, 2, 3, 2, 3, 3, 3} i am not able to understand how the output came as above. Thanks,

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ashish Shrowty
PM Ted Yu <yuzhih...@gmail.com> wrote: > Ashish: > Can you post the complete stack trace for NotSerializableException ? > > Cheers > > On Mon, Aug 31, 2015 at 8:49 AM, Ashish Shrowty <ashish.shro...@gmail.com> > wrote: > >> bcItemsIdx is just a broadcast va

Re: Spark shell and StackOverFlowError

2015-08-31 Thread Ashish Shrowty
Yes .. I am closing the stream. Not sure what you meant by "bq. and then create rdd"? -Ashish On Mon, Aug 31, 2015 at 1:02 PM Ted Yu <yuzhih...@gmail.com> wrote: > I am not familiar with your code. > > bq. and then create the rdd > > I assume you call O

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
Do you think I should create a JIRA? On Sun, Aug 30, 2015 at 12:56 PM Ted Yu yuzhih...@gmail.com wrote: I got StackOverFlowError as well :-( On Sun, Aug 30, 2015 at 9:47 AM, Ashish Shrowty ashish.shro...@gmail.com wrote: Yep .. I tried that too earlier. Doesn't make a difference. Are you

Re: Spark shell and StackOverFlowError

2015-08-30 Thread Ashish Shrowty
#broadcast-variables Cheers On Sun, Aug 30, 2015 at 8:54 AM, Ashish Shrowty ashish.shro...@gmail.com wrote: @Sean - Agree that there is no action, but I still get the stackoverflowerror, its very weird @Ted - Variable a is just an int - val a = 10 ... The error happens when I try to pass

Re: Driver running out of memory - caused by many tasks?

2015-08-27 Thread Ashish Rangole
I suggest taking a heap dump of driver process using jmap. Then open that dump in a tool like Visual VM to see which object(s) are taking up heap space. It is easy to do. We did this and found out that in our case it was the data structure that stores info about stages, jobs and tasks. There can

Re: Worker Machine running out of disk for Long running Streaming process

2015-08-22 Thread Ashish Rangole
Interesting. TD, can you please throw some light on why this is and point to the relevant code in Spark repo. It will help in a better understanding of things that can affect a long running streaming job. On Aug 21, 2015 1:44 PM, Tathagata Das t...@databricks.com wrote: Could you periodically

Java Streaming Context - File Stream use

2015-08-10 Thread Ashish Soni
Please help as not sure what is incorrect with below code as it gives me complilaton error in eclipse SparkConf sparkConf = new SparkConf().setMaster(local[4]).setAppName(JavaDirectKafkaWordCount); JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,

PySpark in Pycharm- unable to connect to remote server

2015-08-05 Thread Ashish Dutt
) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File C:/Users/ashish dutt/PycharmProjects

How to connect to remote HDFS programmatically to retrieve data, analyse it and then write the data back to HDFS?

2015-08-05 Thread Ashish Dutt
) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) Traceback (most recent call last): File C:/Users/ashish dutt/PycharmProjects

Class Loading Issue - Spark Assembly and Application Provided

2015-07-21 Thread Ashish Soni
Hi All , I am having a class loading issue as Spark Assembly is using google guice internally and one of Jar i am using uses sisu-guice-3.1.0-no_aop.jar , How do i load my class first so that it doesn't result in error and tell spark to load its assembly later on Ashish

XML Parsing

2015-07-19 Thread Ashish Soni
Hi All , I have an XML file with same tag repeated multiple times as below , Please suggest what would be best way to process this data inside spark as ... How can i extract each open and closing tag and process them or how can i combine multiple line into single line review /review review

BroadCast on Interval ( eg every 10 min )

2015-07-16 Thread Ashish Soni
Hi All , How can i broadcast a data change to all the executor ever other 10 min or 1 min Ashish

Re: Data Processing speed SQL Vs SPARK

2015-07-13 Thread Ashish Mukherjee
MySQL and PgSQL scale to millions. Spark or any distributed/clustered computing environment would be inefficient for the kind of data size you mention. That's because of coordination of processes, moving data around etc. On Mon, Jul 13, 2015 at 5:34 PM, Sandeep Giri sand...@knowbigdata.com wrote:

Re: SparkR Error in sparkR.init(master=“local”) in RStudio

2015-07-13 Thread Ashish Dutt
()? where is it in windows environment Thanks for your help Sincerely, Ashish Dutt On Mon, Jul 13, 2015 at 3:48 PM, Sun, Rui rui@intel.com wrote: Hi, Kachau, If you are using SparkR with RStudio, have you followed the guidelines in the section Using SparkR from RStudio in https

Re: Is it possible to change the default port number 7077 for spark?

2015-07-13 Thread Ashish Dutt
Hello Arun, Thank you for the descriptive response. And thank you for providing the sample file too. It certainly is a great help. Sincerely, Ashish On Mon, Jul 13, 2015 at 10:30 PM, Arun Verma arun.verma...@gmail.com wrote: PFA sample file On Mon, Jul 13, 2015 at 7:37 PM, Arun Verma

Re: Connecting to nodes on cluster

2015-07-09 Thread Ashish Dutt
Hello Akhil, Thanks for the response. I will have to figure this out. Sincerely, Ashish On Thu, Jul 9, 2015 at 3:40 PM, Akhil Das ak...@sigmoidanalytics.com wrote: On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt ashish.du...@gmail.com wrote: Hi, We have a cluster with 4 nodes. The cluster

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
/6384491/00-Setup-IPython-PySpark.ipynb Thanks, Ashish On Wed, Jul 8, 2015 at 5:49 PM, sooraj soora...@gmail.com wrote: That turned out to be a silly data type mistake. At one point in the iterative call, I was passing an integer value for the parameter 'alpha' of the ALS train API, which

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks you Akhil for the link Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University of Malaya, Lembah Pantai, 50603 Kuala Lumpur, Malaysia On Wed, Jul 8, 2015 at 3:43 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Have a look http://alvinalexander.com/scala/how

Re: PySpark MLlib: py4j cannot find trainImplicitALSModel method

2015-07-08 Thread Ashish Dutt
and hence not much help to me. I am able to launch ipython on localhost but cannot get it to work on the cluster Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 5:49 PM, sooraj soora...@gmail.com wrote: That turned out to be a silly data type mistake. At one point in the iterative call, I

How to upgrade Spark version in CDH 5.4

2015-07-08 Thread Ashish Dutt
--7dc6__section_zd5_1yz_l4 but I do not see any thing relevant Any suggestions directing to a solution are welcome. Thanks, Ashish

Re: Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Ashish Dutt
Hello Prateek, I started with getting the pre built binaries so as to skip the hassle of building them from scratch. I am not familiar with scala so can't comment on it. I have documented my experiences on my blog www.edumine.wordpress.com Perhaps it might be useful to you. On 08-Jul-2015 9:39

Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
Sincerely, Ashish Dutt

Re: Parallelizing multiple RDD / DataFrame creation in Spark

2015-07-08 Thread Ashish Dutt
Thanks for your reply Akhil. How do you multithread it? Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 3:29 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Whats the point of creating them in parallel? You can multi-thread it run it in parallel though. Thanks Best Regards On Wed, Jul 8

Re: Connecting to nodes on cluster

2015-07-08 Thread Ashish Dutt
The error is JVM has not responded after 10 seconds. On 08-Jul-2015 10:54 PM, ayan guha guha.a...@gmail.com wrote: What's the error you are getting? On 9 Jul 2015 00:01, Ashish Dutt ashish.du...@gmail.com wrote: Hi, We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two

DLL load failed: %1 is not a valid win32 application on invoking pyspark

2015-07-08 Thread Ashish Dutt
. Sincerely, Ashish Dutt - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: PySpark without PySpark

2015-07-08 Thread Ashish Dutt
written something wrong here. Cannot seem to figure out, what is it? Thank you for your help Sincerely, Ashish Dutt On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal sujitatgt...@gmail.com wrote: Hi Ashish, Nice post. Agreed, kudos to the author of the post, Benjamin Benfort of District Labs

How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
to the master? Thanks, Ashish

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
Thank you Ayan for your response.. But I have just realised that the Spark is configured to be a history server. Please, can somebody suggest to me how can I convert Spark history server to be a Master server? Thank you Sincerely, Ashish Dutt On Wed, Jul 8, 2015 at 12:28 PM, ayan guha guha.a

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 15/07/08 11:28:35 INFO SecurityManager: Changing view acls to: Ashish Dutt 15/07/08 11:28:35 INFO

Re: How to verify that the worker is connected to master in CDH5.4

2015-07-07 Thread Ashish Dutt
. All I want for now is how to connect my laptop to the spark cluster machine using either pyspark or SparkR. (I have python 2.7) On my laptop I am using winutils in place of hadoop and have spark 1.4 installed Thank you Sincerely, Ashish Dutt PhD Candidate Department of Information Systems University

How Will Spark Execute below Code - Driver and Executors

2015-07-06 Thread Ashish Soni
Hi All , If some one can help me understand as which portion of the code gets executed on Driver and which portion will be executed on executor from the below code it would be a great help I have to load data from 10 Tables and then use that data in various manipulation and i am using SPARK SQL

  1   2   >