Hi,
Could someone please revert on this?
Thanks
Pankaj Bhootra
On Sun, 7 Mar 2021, 01:22 Pankaj Bhootra, wrote:
> Hello Team
>
> I am new to Spark and this question may be a possible duplicate of the
> issue highlighted here: https://issues.apache.org/jira/browse/SPARK-9347
using csv files to parquet, but from my
hands-on so far, it seems that parquet's read time is slower than csv? This
seems contradictory to popular opinion that parquet performs better in
terms of both computation and storage?
Thanks
Pankaj Bhootra
-- Forwarded message
ic. I have
temporarily used a UDF that accepts all these columns as parameters and create
a json string for adding a column "value" for writing to Kafka.
Is there easier and cleaner way to do the same?
Thanks,
Pankaj
def run() = {
println("In shutdown hook")
// stop gracefully
ssCtx.stop(true, true)
}
})
}
}
Pankaj
On Fri, Dec 22, 2017 at 9:56 AM, Toy <noppani...@gmail.com> wrote:
> I'm trying to write a deployment job for Spark application. Basically
Please make sure that you have enough memory available on the driver node. If
there is not enough free memory on the driver node, then your application won't
start.
Pankaj
From: vaquar khan <vaquar.k...@gmail.com<mailto:vaquar.k...@gmail.com>>
Date: Saturday, June 10, 201
(EventLoop.scala:48)
I see that there is Spark ticket opened with the same
issue(https://issues.apache.org/jira/browse/SPARK-19547) but it has been marked
as INVALID. Can someone explain why this ticket is marked INVALID.
Thanks,
Pankaj
You may want to try using df2.na.fill(…)
From: lk_spark
Date: Tuesday, 6 December 2016 at 3:05 PM
To: "user.spark"
Subject: how to add colum to dataframe
hi,all:
my spark version is 2.0
I have a parquet file with one colum name url type is
amount of time taken during execution is fine, but
the process should not Fail.
4. What is exactly meant by Akka timeout error during ALS job execution ?
Regards,
Pankaj Rawat
Next thing you may want to check is if the jar has been provided to all the
executors in your cluster. Most of the class not found errors got resolved for
me after making required jars available in the SparkContext.
Thanks.
From: Ted Yu >
Date:
I am encountering below error. Can somebody guide ?
Something similar is one this link
https://github.com/elastic/elasticsearch-hadoop/issues/298
actor.MentionCrawlActor
java.io.NotSerializableException: actor.MentionCrawlActor
at
Technologies http://www.nubetech.co/
Check out Reifier at Spark Summit 2015
https://spark-summit.org/2015/events/real-time-fuzzy-matching-with-spark-and-elastic-search/
http://in.linkedin.com/in/sonalgoyal
On Wed, Aug 26, 2015 at 8:25 AM, Pankaj Wahane pankaj.wah...@qiotec.com
.
Best Regards,
Pankaj
--
QIO Technologies Limited is a limited company registered in England Wales
at 1 Curzon Street, London, England, W1J 5HD, with registered number
09368431
This message and the information contained within it is intended solely for
the addressee and may contain
but could not achieve the same.
Have anybody have idea how to do that ?
Regards
Pankaj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Restart-at-scheduled-intervals-tp24192.html
Sent from the Apache Spark User List mailing list archive
directory to recover from failures
println(tweets for the last stream are saved which can be processed
later)
val= f:/svn1/checkpoint/
ssc.checkpoint(checkpointDir)
ssc.start()
ssc.awaitTermination()
regards
Pankaj
--
View this message in context:
http://apache-spark-user-list
Hi I am using Spark 1.3.1 to read an avro file stored on HDFS. The avro
file was created using Avro 1.7.7. Similar to the example mentioned in
http://www.infoobjects.com/spark-with-avro/
I am getting a nullPointerException on Schema read. It could be a avro
version mismatch. Has anybody had a
(classOf[Event], new AvroSerializer[Event]()))
}
I encountered a similar error since several of the Avor core classes are
not marked Serializable.
HTH.
Todd
On Tue, May 5, 2015 at 7:09 PM, Pankaj Deshpande ppa...@gmail.com wrote:
Hi I am using Spark 1.3.1 to read an avro file stored
Hi,
I have 3 node spark cluster
node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class
com.fst.firststep.aggregator.FirstStepMessageProcessor --master
spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077
or
loading data into hive tables.
Thanks,
Pankaj
http://spark.apache.org/docs/latest/
Follow this. Its easy to get started. Use prebuilt version of spark as of
now :D
On Thu, Jan 22, 2015 at 5:06 PM, Sudipta Banerjee
asudipta.baner...@gmail.com wrote:
Hi Apache-Spark team ,
What are the system requirements installing Hadoop and Apache
=(_.split(,).length,line))
val groupedData = dataLengthRDD.groupByKey()
now you can process the groupedData as it will have arrays of length x in
one RDD.
groupByKey([numTasks]) When called on a dataset of (K, V) pairs, returns a
dataset of (K, IterableV) pairs.
I hope this helps
Regards
Pankaj
send me the current code here. I will fix and send back to you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p21295.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I just checked the post. do you need help still ?
I think getAs(Seq[String]) should help.
If you are still stuck let me know.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p21252.html
Sent from
Instead of counted.saveAsText(“/path/to/save/dir) if you call
counted.collect what happens ?
If you still face the same issue please paste the stacktrace here.
--
View this message in context:
Good luck. Let me know If I can assist you further
Regards
-Pankaj
Linkedin
https://www.linkedin.com/profile/view?id=171566646
Skype
pankaj.narang
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NoSuchMethodError-com-typesafe-config-Config-getDuration
I suggest to create uber jar instead.
check my thread for the same
http://apache-spark-user-list.1001560.n3.nabble.com/NoSuchMethodError-com-typesafe-config-Config-getDuration-with-akka-http-akka-stream-td20926.html
Regards
-Pankaj
Linkedin
https://www.linkedin.com/profile/view?id=171566646
Thats great. I was not having access on the developer machine so sent you the
psuedo code only.
Happy to see its working. If you need any more help related to spark let me
know anytime.
--
View this message in context:
As per telephonic call see how we can fetch the count
val tweetsCount = sql(SELECT COUNT(*) FROM tweets)
println(f\n\n\nThere are ${tweetsCount.collect.head.getLong(0)} Tweets on
this Dataset\n\n)
--
View this message in context:
If you need more help let me know
-Pankaj
Linkedin
https://www.linkedin.com/profile/view?id=171566646
Skype
pankaj.narang
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p20976.html
Sent from
}, {hiking,1}
Now hbmap .map{case(hobby,count)=(count,hobby)}.sortByKey(ascending
=false).collect
will give you hobbies sorted in descending by their count
This is pseudo code and must help you
Regards
Pankaj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
= popularHashTags.flatMap ( x =
x.getAs[Seq[String]](0))
Even if you want I will take the remote of your machine to fix that
Regards
Pankaj
Linkedin
https://www.linkedin.com/profile/view?id=171566646
Skype
pankaj.narang
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
If you can paste the code here I can certainly help.
Also confirm the version of spark you are using
Regards
Pankaj
Infoshore Software
India
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-tp20951p20953.html
Sent from the Apache Spark
do you assemble the uber jar ?
you can use sbt assembly to build the jar and then run. It should fix the
issue
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NoClassDefFoundError-when-trying-to-run-spark-application-tp20707p20944.html
Sent from the Apache
(FlowMaterializer.scala:256)
I think there is version mismatch on the jars you use at runtime
If you need more help add me on skype pankaj.narang
---Pankaj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/NoSuchMethodError-com-typesafe-config-Config
ON RDD are saveAsObjectFile, saveAsFile
*
Now you can read these files to show them on web interface in any language
of your choice
Regards
Pankaj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Publishing-streaming-results-to-web-interface
.
Pankaj
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Also it looks like that when I store the String in parquet and try to fetch
them using spark code I got classcast exception
below how my array of strings are saved. each character ascii value is
present in array of ints
res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112,
oops
sqlContext.setConf(spark.sql.parquet.binaryAsString, true)
thois solved the issue important for everyone
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html
Sent from the Apache Spark User
Hi,
My incoming message has time stamp as one field and i have to perform
aggregation over 3 minute of time slice.
Message sample
Item ID Item Type timeStamp
1 X 1-12-2014:12:01
1 X 1-12-2014:12:02
1 X
Hi ,
suppose i keep batch size of 3 minute. in 1 batch there can be incoming
records with any time stamp.
so it is difficult to keep track of when the 3 minute interval was start and
end. i am doing output operation on worker nodes in forEachPartition not in
drivers(forEachRdd) so i cannot use
.
Now, I can't figure out as to why it should run successfully during this
time even if it could not find SparkContext. I am sure there should be good
reason behind this behavior. Anyone has any idea on this?
Thanks,
Pankaj Channe
On Saturday, November 22, 2014, pankaj channe pankajc...@gmail.com
, Nov 22, 2014 at 8:39 AM, pankaj channe pankajc...@gmail.com
wrote:
I have seen similar posts on this issue but could not find solution.
Apologies if this has been discussed here before.
I am running a spark streaming job with yarn on a 5 node cluster. I am
using following command to submit my
:169)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at
org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169)
Note: I am building my jar on my local with spark dependency added in
pom.xml and running it on cluster running spark.
-Pankaj
43 matches
Mail list logo