Facing weird problem while reading Parquet

2021-08-10 Thread Prateek Rajput
parquet and we are doing simple write and read. for writing - *ds.write().parquet(outputPath); // this is writing 40K part files* for reading - *sqlContext.read().parquet(inputPath).javaRDD() // here we are trying to read same 40K part files* *Regards,* *Prateek Rajput

Re: Getting EOFFileException while reading from sequence file in spark

2019-05-03 Thread Prateek Rajput
Hi all, Please share if anyone have faced the same problem. There are many similar issues on web but I did not find any solution and reason why this happens. It will be really helpful. Regards, Prateek On Mon, Apr 29, 2019 at 3:18 PM Prateek Rajput wrote: > I checked and removed 0 sized fi

Re: How to specify number of Partition using newAPIHadoopFile()

2019-04-30 Thread Prateek Rajput
On Tue, Apr 30, 2019 at 6:48 PM Vatsal Patel wrote: > *Issue: * > > When I am reading sequence file in spark, I can specify the number of > partitions as an argument to the API, below is the way > *public JavaPairRDD sequenceFile(String path, Class > keyClass, Class valueClass, int

Re: Getting EOFFileException while reading from sequence file in spark

2019-04-29 Thread Prateek Rajput
no such issue is coming it is happening in case of spark only. On Mon, Apr 29, 2019 at 2:50 PM Deepak Sharma wrote: > This can happen if the file size is 0 > > On Mon, Apr 29, 2019 at 2:28 PM Prateek Rajput > wrote: > >> Hi guys, >> I am getting this strange error again and ag

Getting EOFFileException while reading from sequence file in spark

2019-04-29 Thread Prateek Rajput
core_2.11 Regards, Prateek

[Mesos] Are InverseOffers ignored?

2018-04-19 Thread Prateek Sharma
guessing that currently we ignore inverse offers completely? Thanks, --Prateek - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Error report file is deleted automatically after spark application finished

2016-06-30 Thread prateek arora
imited but when my spark application crash , show error " Failed to write core dump. Core dumps have been disabled. To enablecore dumping, try "ulimit -c unlimited" before starting Java again”. Regards Prateek On Wed, Jun 29, 2016 at 9:30 PM, dhruve ashar <dhruveas...@gmail.com&

Error report file is deleted automatically after spark application finished

2016-06-29 Thread prateek arora
m not able to found "/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log" file . its deleted automatically after Spark application finished how to retain report file , i am running spark with yarn . Regard

Re: How to enable core dump in spark

2016-06-16 Thread prateek arora
-u) 241204 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Regards Prateek On Thu, Jun 16, 2016 at 4:46 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Can you make sure that the ulimit settings are applied to the Spark > pr

Re: How to enable core dump in spark

2016-06-02 Thread prateek arora
please help me to solve my problem Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065p27081.html Sent from the Apache Spark User List mailing list archive at Nabble.com

How to enable core dump in spark

2016-06-01 Thread prateek arora
om/bugreport/crash.jsp # so how can i enable core dump and save it some place ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html Sent from the Apache Spark User

Re: How to get and save core dump of native library in executors

2016-05-16 Thread prateek arora
Please help to solve my problem . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26967.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How to get and save core dump of native library in executors

2016-05-13 Thread prateek arora
I am running my cluster on Ubuntu 14.04 Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26952.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: How to get and save core dump of native library in executors

2016-05-12 Thread prateek arora
ubuntu 14.04 On Thu, May 12, 2016 at 2:40 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which OS are you using ? > > See http://en.linuxreviews.org/HOWTO_enable_core-dumps > > On Thu, May 12, 2016 at 2:23 PM, prateek arora <prateek.arora...@gmail.com > > wrote: >

How to get and save core dump of native library in executors

2016-05-12 Thread prateek arora
mation is saved as: # /yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log # # If you would like to submit a bug report, please visit: # http://bugreport.sun.com/bugreport/crash.jsp # so how can i enable core dump and save it

spark 1.6 : RDD Partitions not distributed evenly to executors

2016-05-09 Thread prateek arora
.novalocal, partition 4,PROCESS_LOCAL, 2248 bytes) Is above configuration is correct solution for problem ? and why spark.shuffle.reduceLocality.enabled not mentioned in spark configuration document ? Regards Prateek -- View this message in context: http://apache

Creating a New Cassandra Table From a DataFrame Schema

2016-04-12 Thread Prateek .
( "test", "renamed", partitionKeyColumns = Some(Seq("user")), clusteringKeyColumns = Some(Seq("newcolumnname"))) The doc says: // Add spark connector specific methods to DataFrame How can I achieve this.? Thanks Prateek "DISCL

Starting Spark Job Remotely with Function Call

2016-04-12 Thread Prateek .
in my driver. I was wondering what if we don't use spark submit/spark job server and give a call to the function that executes the job. Will it create any implications in production environment, am I missing some important points? Thank You, Prateek "DISCLAIMER: This message is propri

Re: is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread prateek arora
Hi Thanks for the information . it will definitely solve my problem I have one more question .. if i want to launch a spark application in production environment so is there any other way so multiple users can submit there job without having hadoop configuration . Regards Prateek On Fri

is there any way to submit spark application from outside of spark cluster

2016-03-25 Thread prateek arora
Hi I want to submit spark application from outside of spark clusters . so please help me to provide a information regarding this. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
or not ? Regards Prateek On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <ja...@odersky.com> wrote: > Have you tried setting the configuration > `spark.executor.extraLibraryPath` to point to a location where your > .so's are available? (Not sure if non-local files, such as HDFS,

Re: How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-14 Thread prateek arora
Hi Thanks for the information . but my problem is that if i want to write spark application which depend on third party libraries like opencv then whats is the best approach to distribute all .so and jar file of opencv in all cluster ? Regards Prateek -- View this message in context

How to distribute dependent files (.so , jar ) across spark worker nodes

2016-03-11 Thread prateek arora
Hi I have multiple node cluster and my spark jobs depend on a native library (.so files) and some jar files. Can some one please explain what are the best ways to distribute dependent files across nodes? right now i copied dependent files in all nodes using chef tool . Regards Prateek

RE: Spark job for Reading time series data from Cassandra

2016-03-10 Thread Prateek .
partitions to be created. Following is Jira link: https://datastax-oss.atlassian.net/browse/SPARKC-208?jql=project%20%3D%20SPARKC%20AND%20fixVersion%20%3D%201.4.0-M2 Thanks , Prateek From: Matthias Niehoff [mailto:matthias.nieh...@codecentric.de] Sent: Thursday, March 10, 2016 9:28 PM To: Bryan Jeffrey

Spark job for Reading time series data from Cassandra

2016-03-10 Thread Prateek .
/stages/stage?id=0=0>+details 2016/03/10 21:01:15 9 s 137/770870 Thank You Prateek "DISCLAIMER: This message is proprietary to Aricent and is intended solely for the use of the individual to whom it is addressed. It may contain privileged or confidential information and should not

Re: Configuring Ports for Network Security

2016-03-02 Thread Guru Prateek Pinnadhari
Thanks for your response. End users and developers in our scenario need terminal / SSH access to the cluster. So cluster isolation from external networks is not an option. We use a Hortonworks based hadoop cluster. Knox is useful but as users also have shell access, we need iptables. Even

Re: Spark REST API shows Error 503 Service Unavailable

2015-12-17 Thread prateek arora
(yet to be released) onwards." On Thu, Dec 17, 2015 at 3:24 PM, Vikram Kone <vikramk...@gmail.com> wrote: > Hi Prateek, > Were you able to figure why this is happening? I'm seeing the same error > on my spark standalone cluster. > > Any pointers anyone? > > On Fri, D

Spark REST API shows Error 503 Service Unavailable

2015-12-11 Thread prateek arora
Hi I am trying to access Spark Using REST API but got below error : Command : curl http://:18088/api/v1/applications Response: Error 503 Service Unavailable HTTP ERROR 503 Problem accessing /api/v1/applications. Reason: Service Unavailable Caused by:

can i process multiple batch in parallel in spark streaming

2015-12-09 Thread prateek arora
- processing it seems batches push into queue and work like FIFO manner . is it possible all my Active batches start processing in parallel. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel

Re: can i process multiple batch in parallel in spark streaming

2015-12-09 Thread prateek arora
Hi Thanks In my scenario batches are independent .so is it safe to use in production environment ? Regards Prateek On Wed, Dec 9, 2015 at 11:39 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Have you seen this thread ? > > http://search-hadoop.com/m/q3RTtgSGrobJ3Je > > On Wed,

can i write only RDD transformation into hdfs or any other storage system

2015-12-08 Thread prateek arora
Hi Is it possible into spark to write only RDD transformation into hdfs or any other storage system ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/can-i-write-only-RDD-transformation-into-hdfs-or-any-other-storage-system-tp25637.html

is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
supported then are we need to set "spark.driver.allowMultipleContexts" configuration parameter ? Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html Sent from the Apache Spa

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
Hi Ted Thanks for the information . is there any way that two different spark application share there data ? Regards Prateek On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote: > See Josh's response in this thread: > > > http://search-hadoop.com/m/q3RTt1z1hU

Re: is Multiple Spark Contexts is supported in spark 1.5.0 ?

2015-12-04 Thread prateek arora
Thanks ... Is there any way my second application run in parallel and wait for fetching data from hbase or any other data storeage system ? Regards Prateek On Fri, Dec 4, 2015 at 10:24 AM, Ted Yu <yuzhih...@gmail.com> wrote: > How about using NoSQL data store such as HBase :-) > &

how to spark streaming application start working on next batch before completing on previous batch .

2015-12-03 Thread prateek arora
application start working on next batch before completing on previous batch . means batches will execute in parallel. please help me to solve this problem. Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-spark-streaming-application-start

Spark DStream Data stored out of order in Cassandra

2015-11-30 Thread Prateek .
, out of order data in Cassandra schema. Does Spark Streaming provide any functionality to retain order. Or do we need do implement some sorting based on timestamp of arrival. Regards, Prateek "DISCLAIMER: This message is proprietary to Aricent and is intended solely for the use of the indiv

Re: how can evenly distribute my records in all partition

2015-11-18 Thread prateek arora
a and how your keys are > spread currently. Do you want to compute something per day, per week etc. > Based on that, return a partition number. You could use mod 30 or some such > function to get the partitions. > On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@g

Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
wrote: > You can write your own custom partitioner to achieve this > > Regards > Sab > On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com> > wrote: > >> Hi >> >> I have a RDD with 30 record ( Key/value pair ) and running 30 exec

Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
custom partitioner in my case: my parent RDD have 4 partition and RDD key is : TimeStamp and Value is JPEG Byte Array Regards Prateek On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote: > Please take a look at the following for example: > > ./core/src/main/scala/o

how can evenly distribute my records in all partition

2015-11-16 Thread prateek arora
, some get 1 record and some not getting any record . is there any way in spark so i can evenly distribute my record in all partition . Regards Prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition

RE: Streaming Application Unable to get Stream from Kafka

2015-10-12 Thread Prateek .
Hi Terry, Thanks a lot. It was the resource problem , Spark was able to get only one thread. It’s working fine now with local[*]. Cheers, Prateek From: Terry Hoo [mailto:hujie.ea...@gmail.com] Sent: Saturday, October 10, 2015 9:51 AM To: Prateek . <prat...@aricent.com> Cc

RE: Streaming Application Unable to get Stream from Kafka

2015-10-09 Thread Prateek .
the class serializable. Now ,the application is working fine in standalone mode, but not able to receive data in local mode with the below mentioned log. What is internally happening?, if anyone have some insights Please share! Thank You in advance Regards, Prateek From: Prateek . Sent: Friday

Streaming Application Unable to get Stream from Kafka

2015-10-09 Thread Prateek .
/09 18:37:24 INFO BlockGenerator: Pushed block input-0-1444396043800 Thanks in advance Prateek "DISCLAIMER: This message is proprietary to Aricent and is intended solely for the use of the individual to whom it is addressed. It may contain privileged or confidential information and s

RE: DStream Transformation to save JSON in Cassandra 2.1

2015-10-06 Thread Prateek .
238331780492) | Some(0.5235250642853548) I am not able to figure out how to map the Dstream[Coordinate] to columns in schema . Thank You Prateek -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Monday, October 05, 2015 7:58 PM To: user@spark.apache.org Subject: Re

DStream Transformation to save JSON in Cassandra 2.1

2015-10-05 Thread Prateek .
I need to store each coordinate values in the below Cassandra schema CREATE TABLE iotdata.coordinate ( id text PRIMARY KEY, ax double, ay double, az double, oa double, ob double, oz double ) For this what transformations I need to apply before I execute saveToCassandra(). Thank You, Prate

Sprk RDD : want to combine elements that have approx same keys

2015-09-10 Thread prateek arora
range same with 214,213,212 keys and so on. how can i do this regards prateek -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644.html Sent from the Apache Spark User List mailing list

get java.io.FileNotFoundException when use addFile Function

2015-07-15 Thread prateek arora
I am trying to write a simple program using addFile Function but getting error in my worker node that file doest not exist tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, slave2.novalocal): java.io.FileNotFoundException: File

get java.io.FileNotFoundException when use addFile Function

2015-07-15 Thread prateek arora
(file://+SparkFiles.get(csv_ip.csv)) inFile.take(10).foreach(println) please help me resolve error. Thanks in advance. Regards prateek

Saving RDD into cassandra keyspace.

2015-07-10 Thread Prateek .
Hi, I am beginner to spark , I want save the word and its count to cassandra keyspace, I wrote the following code import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf import com.datastax.spark.connector._ object SparkWordCount { def

SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell

2015-07-10 Thread Prateek .
Hi, I am running single spark-shell but observing this error when I give val sc = new SparkContext(conf) 15/07/10 15:42:56 WARN AbstractLifeCycle: FAILED SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use java.net.BindException: Address already in use

RE: Saving RDD into cassandra keyspace.

2015-07-10 Thread Prateek .
Hi, Thanks Todd..the link is really helpful to get started. ☺ -Prateek From: Todd Nist [mailto:tsind...@gmail.com] Sent: Friday, July 10, 2015 4:43 PM To: Prateek . Cc: user@spark.apache.org Subject: Re: Saving RDD into cassandra keyspace. I would strongly encourage you to read the docs

RE: SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell

2015-07-10 Thread Prateek .
Thanks Akhil! I got it . ☺ From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: Friday, July 10, 2015 4:02 PM To: Prateek . Cc: user@spark.apache.org Subject: Re: SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in use when running spark-shell that's because sc

Getting started with spark-scala developemnt in eclipse.

2015-07-08 Thread Prateek .
Hi I am beginner to scala and spark. I am trying to set up eclipse environment to develop spark program in scala, then take it's jar for spark-submit. How shall I start? To start my task includes, setting up eclipse for scala and spark, getting dependencies resolved, building project using

Re: connector for CouchDB

2015-01-29 Thread prateek arora
I am also looking for connector for CouchDB in Spark. did you find anything ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21422.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

spark connector for CouchDB

2015-01-29 Thread prateek arora
i am looking for the spark connector for Couch DB please help me . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-connector-for-CouchDB-tp21421.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: connector for CouchDB

2015-01-29 Thread prateek arora
yes please but i am new for spark and couchdb . -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21428.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: connector for CouchDB

2015-01-29 Thread prateek arora
I can also switch to the mongodb if spark have a support for the. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21429.html Sent from the Apache Spark User List mailing list archive at Nabble.com.