parquet and we are doing simple write and read.
for writing - *ds.write().parquet(outputPath); // this is writing 40K
part files*
for reading - *sqlContext.read().parquet(inputPath).javaRDD() // here we
are trying to read same 40K part files*
*Regards,*
*Prateek Rajput
Hi all,
Please share if anyone have faced the same problem. There are many similar
issues on web but I did not find any solution and reason why this happens.
It will be really helpful.
Regards,
Prateek
On Mon, Apr 29, 2019 at 3:18 PM Prateek Rajput
wrote:
> I checked and removed 0 sized fi
On Tue, Apr 30, 2019 at 6:48 PM Vatsal Patel
wrote:
> *Issue: *
>
> When I am reading sequence file in spark, I can specify the number of
> partitions as an argument to the API, below is the way
> *public JavaPairRDD sequenceFile(String path, Class
> keyClass, Class valueClass, int
no such issue is coming it is happening in case of spark only.
On Mon, Apr 29, 2019 at 2:50 PM Deepak Sharma wrote:
> This can happen if the file size is 0
>
> On Mon, Apr 29, 2019 at 2:28 PM Prateek Rajput
> wrote:
>
>> Hi guys,
>> I am getting this strange error again and ag
core_2.11
Regards,
Prateek
guessing
that currently we ignore inverse offers completely?
Thanks,
--Prateek
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
imited
but when my spark application crash , show error " Failed to
write core dump. Core dumps have been disabled. To enablecore dumping, try
"ulimit -c unlimited" before starting Java again”.
Regards
Prateek
On Wed, Jun 29, 2016 at 9:30 PM, dhruve ashar <dhruveas...@gmail.com&
m not able to found
"/yarn/nm/usercache/ubuntu/appcache/application_1467236060045_0001/container_1467236060045_0001_01_03/hs_err_pid12207.log"
file . its deleted automatically after Spark application
finished
how to retain report file , i am running spark with yarn .
Regard
-u) 241204
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Regards
Prateek
On Thu, Jun 16, 2016 at 4:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:
> Hi,
>
> Can you make sure that the ulimit settings are applied to the Spark
> pr
please help me to solve my problem
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065p27081.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
om/bugreport/crash.jsp
#
so how can i enable core dump and save it some place ?
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-enable-core-dump-in-spark-tp27065.html
Sent from the Apache Spark User
Please help to solve my problem .
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26967.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I am running my cluster on Ubuntu 14.04
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-get-and-save-core-dump-of-native-library-in-executors-tp26945p26952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
ubuntu 14.04
On Thu, May 12, 2016 at 2:40 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> Which OS are you using ?
>
> See http://en.linuxreviews.org/HOWTO_enable_core-dumps
>
> On Thu, May 12, 2016 at 2:23 PM, prateek arora <prateek.arora...@gmail.com
> > wrote:
>
mation is saved as:
#
/yarn/nm/usercache/master/appcache/application_1462930975871_0004/container_1462930975871_0004_01_66/hs_err_pid20458.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.sun.com/bugreport/crash.jsp
#
so how can i enable core dump and save it
.novalocal, partition 4,PROCESS_LOCAL, 2248 bytes)
Is above configuration is correct solution for problem ? and why
spark.shuffle.reduceLocality.enabled not mentioned in spark configuration
document ?
Regards
Prateek
--
View this message in context:
http://apache
(
"test",
"renamed",
partitionKeyColumns = Some(Seq("user")),
clusteringKeyColumns = Some(Seq("newcolumnname")))
The doc says:
// Add spark connector specific methods to DataFrame
How can I achieve this.?
Thanks
Prateek
"DISCL
in my driver.
I was wondering what if we don't use spark submit/spark job server and give a
call to the function that executes the job.
Will it create any implications in production environment, am I missing some
important points?
Thank You,
Prateek
"DISCLAIMER: This message is propri
Hi
Thanks for the information . it will definitely solve my problem
I have one more question .. if i want to launch a spark application in
production environment so is there any other way so multiple users can
submit there job without having hadoop configuration .
Regards
Prateek
On Fri
Hi
I want to submit spark application from outside of spark clusters . so
please help me to provide a information regarding this.
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/is-there-any-way-to-submit-spark-application-from-outside
or not ?
Regards
Prateek
On Mon, Mar 14, 2016 at 2:31 PM, Jakob Odersky <ja...@odersky.com> wrote:
> Have you tried setting the configuration
> `spark.executor.extraLibraryPath` to point to a location where your
> .so's are available? (Not sure if non-local files, such as HDFS,
Hi
Thanks for the information .
but my problem is that if i want to write spark application which depend on
third party libraries like opencv then whats is the best approach to
distribute all .so and jar file of opencv in all cluster ?
Regards
Prateek
--
View this message in context
Hi
I have multiple node cluster and my spark jobs depend on a native
library (.so files) and some jar files.
Can some one please explain what are the best ways to distribute dependent
files across nodes?
right now i copied dependent files in all nodes using chef tool .
Regards
Prateek
partitions to be created.
Following is Jira link:
https://datastax-oss.atlassian.net/browse/SPARKC-208?jql=project%20%3D%20SPARKC%20AND%20fixVersion%20%3D%201.4.0-M2
Thanks ,
Prateek
From: Matthias Niehoff [mailto:matthias.nieh...@codecentric.de]
Sent: Thursday, March 10, 2016 9:28 PM
To: Bryan Jeffrey
/stages/stage?id=0=0>+details
2016/03/10 21:01:15
9 s
137/770870
Thank You
Prateek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for
the use of the individual to whom it is addressed. It may contain privileged or
confidential information and should not
Thanks for your response.
End users and developers in our scenario need terminal / SSH access to the
cluster. So cluster isolation from external networks is not an option.
We use a Hortonworks based hadoop cluster. Knox is useful but as users also
have shell access, we need iptables.
Even
(yet to be released) onwards."
On Thu, Dec 17, 2015 at 3:24 PM, Vikram Kone <vikramk...@gmail.com> wrote:
> Hi Prateek,
> Were you able to figure why this is happening? I'm seeing the same error
> on my spark standalone cluster.
>
> Any pointers anyone?
>
> On Fri, D
Hi
I am trying to access Spark Using REST API but got below error :
Command :
curl http://:18088/api/v1/applications
Response:
Error 503 Service Unavailable
HTTP ERROR 503
Problem accessing /api/v1/applications. Reason:
Service Unavailable
Caused by:
-
processing
it seems batches push into queue and work like FIFO manner . is it possible
all my Active batches start processing in parallel.
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/can-i-process-multiple-batch-in-parallel
Hi Thanks
In my scenario batches are independent .so is it safe to use in production
environment ?
Regards
Prateek
On Wed, Dec 9, 2015 at 11:39 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Have you seen this thread ?
>
> http://search-hadoop.com/m/q3RTtgSGrobJ3Je
>
> On Wed,
Hi
Is it possible into spark to write only RDD transformation into hdfs or any
other storage system ?
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/can-i-write-only-RDD-transformation-into-hdfs-or-any-other-storage-system-tp25637.html
supported then are we need to set
"spark.driver.allowMultipleContexts" configuration parameter ?
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/is-Multiple-Spark-Contexts-is-supported-in-spark-1-5-0-tp25568.html
Sent from the Apache Spa
Hi Ted
Thanks for the information .
is there any way that two different spark application share there data ?
Regards
Prateek
On Fri, Dec 4, 2015 at 9:54 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> See Josh's response in this thread:
>
>
> http://search-hadoop.com/m/q3RTt1z1hU
Thanks ...
Is there any way my second application run in parallel and wait for
fetching data from hbase or any other data storeage system ?
Regards
Prateek
On Fri, Dec 4, 2015 at 10:24 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> How about using NoSQL data store such as HBase :-)
>
&
application start working on next batch
before completing on previous batch . means batches will execute in
parallel.
please help me to solve this problem.
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-spark-streaming-application-start
, out of order data in
Cassandra schema. Does Spark Streaming provide any functionality to retain
order. Or do we need do implement some sorting based on timestamp of arrival.
Regards,
Prateek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for
the use of the indiv
a and how your keys are
> spread currently. Do you want to compute something per day, per week etc.
> Based on that, return a partition number. You could use mod 30 or some such
> function to get the partitions.
> On Nov 18, 2015 5:17 AM, "prateek arora" <prateek.arora...@g
wrote:
> You can write your own custom partitioner to achieve this
>
> Regards
> Sab
> On 17-Nov-2015 1:11 am, "prateek arora" <prateek.arora...@gmail.com>
> wrote:
>
>> Hi
>>
>> I have a RDD with 30 record ( Key/value pair ) and running 30 exec
custom partitioner in my case:
my parent RDD have 4 partition and RDD key is : TimeStamp and Value is JPEG
Byte Array
Regards
Prateek
On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> Please take a look at the following for example:
>
> ./core/src/main/scala/o
, some get 1 record and some not getting any
record .
is there any way in spark so i can evenly distribute my record in all
partition .
Regards
Prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition
Hi Terry,
Thanks a lot. It was the resource problem , Spark was able to get only one
thread. It’s working fine now with local[*].
Cheers,
Prateek
From: Terry Hoo [mailto:hujie.ea...@gmail.com]
Sent: Saturday, October 10, 2015 9:51 AM
To: Prateek . <prat...@aricent.com>
Cc
the class
serializable.
Now ,the application is working fine in standalone mode, but not able to
receive data in local mode with the below mentioned log.
What is internally happening?, if anyone have some insights Please share!
Thank You in advance
Regards,
Prateek
From: Prateek .
Sent: Friday
/09 18:37:24 INFO BlockGenerator: Pushed block input-0-1444396043800
Thanks in advance
Prateek
"DISCLAIMER: This message is proprietary to Aricent and is intended solely for
the use of the individual to whom it is addressed. It may contain privileged or
confidential information and s
238331780492) | Some(0.5235250642853548)
I am not able to figure out how to map the Dstream[Coordinate] to columns in
schema .
Thank You
Prateek
-Original Message-
From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net]
Sent: Monday, October 05, 2015 7:58 PM
To: user@spark.apache.org
Subject: Re
I need to store each coordinate
values in the below Cassandra schema
CREATE TABLE iotdata.coordinate (
id text PRIMARY KEY, ax double, ay double, az double, oa double, ob double,
oz double
)
For this what transformations I need to apply before I execute
saveToCassandra().
Thank You,
Prate
range
same with 214,213,212 keys and so on.
how can i do this
regards
prateek
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Sprk-RDD-want-to-combine-elements-that-have-approx-same-keys-tp24644.html
Sent from the Apache Spark User List mailing list
I am trying to write a simple program using addFile Function but getting
error in my worker node that file doest not exist
tage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost
task 0.3 in stage 0.0 (TID 3, slave2.novalocal):
java.io.FileNotFoundException: File
(file://+SparkFiles.get(csv_ip.csv))
inFile.take(10).foreach(println)
please help me resolve error. Thanks in advance.
Regards
prateek
Hi,
I am beginner to spark , I want save the word and its count to cassandra
keyspace, I wrote the following code
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
import com.datastax.spark.connector._
object SparkWordCount {
def
Hi,
I am running single spark-shell but observing this error when I give val sc =
new SparkContext(conf)
15/07/10 15:42:56 WARN AbstractLifeCycle: FAILED
SelectChannelConnector@0.0.0.0:4040: java.net.BindException: Address already in
use
java.net.BindException: Address already in use
Hi,
Thanks Todd..the link is really helpful to get started. ☺
-Prateek
From: Todd Nist [mailto:tsind...@gmail.com]
Sent: Friday, July 10, 2015 4:43 PM
To: Prateek .
Cc: user@spark.apache.org
Subject: Re: Saving RDD into cassandra keyspace.
I would strongly encourage you to read the docs
Thanks Akhil! I got it . ☺
From: Akhil Das [mailto:ak...@sigmoidanalytics.com]
Sent: Friday, July 10, 2015 4:02 PM
To: Prateek .
Cc: user@spark.apache.org
Subject: Re: SelectChannelConnector@0.0.0.0:4040: java.net.BindException:
Address already in use when running spark-shell
that's because sc
Hi
I am beginner to scala and spark. I am trying to set up eclipse environment to
develop spark program in scala, then take it's jar for spark-submit.
How shall I start? To start my task includes, setting up eclipse for scala and
spark, getting dependencies resolved, building project using
I am also looking for connector for CouchDB in Spark. did you find anything ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21422.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
i am looking for the spark connector for Couch DB please help me .
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/spark-connector-for-CouchDB-tp21421.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
yes please but i am new for spark and couchdb .
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21428.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I can also switch to the mongodb if spark have a support for the.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/connector-for-CouchDB-tp18630p21429.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
57 matches
Mail list logo