Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud
_2.11 > > > ...and it will be automatically fetched and used. > > Thanks, > Abhishek > > > On Sun, Jul 28, 2019 at 4:42 AM naresh Goud > wrote: > >> It looks there is some internal dependency missing. >> >> libraryDependencies ++= Seq( >> "c

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud
/ On Sat, Jul 27, 2019 at 5:34 PM naresh Goud wrote: > Hi Abhishek, > > > We are not able to build jar using git hub code with below error? > > Any others able to build jars? Is there anything else missing? > > > > Note: Unresolved dependencies path: > [warn]

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud
Hi Abhishek, We are not able to build jar using git hub code with below error? Any others able to build jars? Is there anything else missing? Note: Unresolved dependencies path: [warn] com.qubole:spark-acid-shaded-dependencies_2.11:0.1

Re: New Spark Datasource for Hive ACID tables

2019-07-26 Thread naresh Goud
Thanks Abhishek. Will it work on hive acid table which is not compacted ? i.e table having base and delta files? Let’s say hive acid table customer Create table customer(customer_id int, customer_name string, customer_email string) cluster by customer_id buckets 10 location ‘/test/customer’

Re: Spark SQL

2019-06-19 Thread naresh Goud
use the hive > parser or optimization engine. Instead it uses Catalyst, see > https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html > > On Mon, Jun 10, 2019 at 2:07 PM naresh Goud > wrote: > >> Hi Team, >> >> Is Spark Sql

Override jars in spark submit

2019-06-19 Thread naresh Goud
Hello All, How can we override jars in spark submit? We have hive-exec-spark jar which is available as part of default spark cluster jars. We wanted to override above mentioned jar in spark submit with latest version jar. How do we do that ? Thank you, Naresh -- Thanks, Naresh

Spark SQL

2019-06-10 Thread naresh Goud
Hi Team, Is Spark Sql uses hive engine to run queries ? My understanding that spark sql uses hive meta store to get metadata information to run queries. Thank you, Naresh -- Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/

Re: Subscribe Multiple Topics Structured Streaming

2018-09-17 Thread naresh Goud
You can have below statement for multiple topics val dfStatus = spark.readStream. format("kafka"). option("subscribe", "utility-status, utility-critical"). option("kafka.bootstrap.servers", "localhost:9092"). option("startingOffsets", "earliest") .load() On Mon,

Re: java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device

2018-08-19 Thread naresh Goud
Also check enough space available on /tmp directory On Fri, Aug 17, 2018 at 10:14 AM Jeevan K. Srivatsa < jeevansriva...@gmail.com> wrote: > Hi Venkata, > > On a quick glance, it looks like a file-related issue more so than an > executor issue. If the logs are not that important, I would clear >

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

2018-05-30 Thread naresh Goud
What are you doing? Give more details o what are you doing On Wed, May 30, 2018 at 12:58 PM Arun Hive wrote: > > Hi > > While running my spark job component i am getting the following exception. > Requesting for your help on this: > Spark core version - > spark-core_2.10-2.1.1 > > Spark

Re: ERROR: Hive on Spark

2018-04-16 Thread naresh Goud
Change you table name in query to spam.spamdataset instead of spamdataset. On Sun, Apr 15, 2018 at 2:12 PM Rishikesh Gawade wrote: > Hello there. I am a newbie in the world of Spark. I have been working on a > Spark Project using Java. > I have configured Hive and

Re: [Spark sql]: Re-execution of same operation takes less time than 1st

2018-04-03 Thread naresh Goud
Whenever spark read the data from it will have it in executor memory until and unless there is no room for new data read or processed. This is the beauty of spark. On Tue, Apr 3, 2018 at 12:42 AM snjv wrote: > Hi, > > When we execute the same operation twice, spark

Re: How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread naresh Goud
>From spark point of view it shouldn’t effect. it’s possible to extend columns of new parquet files and it won’t affect Performance and not required to change spark application code. On Tue, Apr 3, 2018 at 9:14 AM Vitaliy Pisarev wrote: > This is not strictly a

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread naresh Goud
In case of storing as parquet file I don’t think it requires header. option("header","true") Give a try by removing header option and then try to read it. I haven’t tried. Just a thought. Thank you, Naresh On Tue, Mar 27, 2018 at 9:47 PM Mina Aslani wrote: > Hi, > > >

Re: is there a way to catch exceptions on executor level

2018-03-10 Thread naresh Goud
How about accumaltors? Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/ On Thu, Mar 8, 2018 at 12:07 AM Chethan Bhawarlal < cbhawar...@collectivei.com> wrote: > Hi Dev, > > I am doing spark operations on Rdd level for each row like this, > > private def

Re: Reading kafka and save to parquet problem

2018-03-07 Thread naresh Goud
change it to readStream instead of read as below val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribe", "topic1") .load() Check is this helpful

Re: How does Spark Structured Streaming determine an event has arrived late?

2018-02-27 Thread naresh Goud
Hi Kant, TD's explanation makes a lot sense. Refer this stackoverflow, where its was explained with program output. Hope this helps. https://stackoverflow.com/questions/45579100/structured-streaming-watermark-vs-exactly-once-semantics Thanks, Naresh www.linkedin.com/in/naresh-dulam

Re: Out of memory Error when using Collection Accumulator Spark 2.2

2018-02-26 Thread naresh Goud
what is your driver memory? Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/ On Mon, Feb 26, 2018 at 3:45 AM, Patrick wrote: > Hi, > > We were getting OOM error when we are accumulating the results of each > worker. We were trying to

Re: partitionBy with partitioned column in output?

2018-02-26 Thread naresh Goud
is this helps? sc.parallelize(List((1,10),(2, 20))).toDF("foo","bar").map(("foo","bar")=>("foo",("foo","bar"))). partitionBy("foo").json("json-out") On Mon, Feb 26, 2018 at 4:28 PM, Alex Nastetsky wrote: > Is there a way to make outputs created with "partitionBy" to

Re: Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

2018-02-26 Thread naresh Goud
://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/structured-streaming-programming-guide.html#continuous-processing > > On Sun, Feb 25, 2018 at 12:26 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> Hello Spark Experts, >> >> What i

Re: Spark structured streaming: periodically refresh static data frame

2018-02-25 Thread naresh Goud
Appu, I am also landed in same problem. Are you able to solve this issue? Could you please share snippet of code if your able to do? Thanks, Naresh On Wed, Feb 14, 2018 at 8:04 PM, Tathagata Das wrote: > 1. Just loop like this. > > > def startQuery(): Streaming

Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

2018-02-25 Thread naresh Goud
Hello Spark Experts, What is the difference between Trigger.Continuous(10.seconds) and Trigger.ProcessingTime("10 seconds") ? Thank you, Naresh

Re: Consuming Data in Parallel using Spark Streaming

2018-02-22 Thread naresh Goud
entiate records of one type of entity from other type of >entities. > > > > -Beejal > > > > *From:* naresh Goud [mailto:nareshgoud.du...@gmail.com] > *Sent:* Friday, February 23, 2018 8:56 AM > *To:* Vibhakar, Beejal <beejal.vibha...@fisglobal.com> >

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud
; Regards, > Keith. > > http://keith-chapman.com > > On Thu, Feb 22, 2018 at 6:58 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> It would be very difficult to tell without knowing what is your >> application code doing, what kind of transformation/

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud
It would be very difficult to tell without knowing what is your application code doing, what kind of transformation/actions performing. From my previous experience tuning application code which avoids unnecessary objects reduce pressure on GC. On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman

Re: Return statements aren't allowed in Spark closures

2018-02-22 Thread naresh Goud
Even i am not able to reproduce error On Thu, Feb 22, 2018 at 2:51 AM, Michael Artz wrote: > I am not able to reproduce your error. You should do something before you > do that last function and maybe get some more help from the exception it > returns. Like just add a

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
ark/strea > ming/kafka010/KafkaUtils.scala > > FYI > > On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> Hello Team, >> >> I see "KafkaUtils.createStream() " method not available in spark 2.2.1. >> >>

KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud
Hello Team, I see "KafkaUtils.createStream() " method not available in spark 2.2.1. Can someone please confirm if these methods are removed? below is my pom.xml entries. 2.11.8 2.11 org.apache.spark spark-streaming_${scala.tools.version} 2.2.1 provided

Re: Issue with Cast in Spark Sql

2018-01-30 Thread naresh Goud
Spark/Hive converting decimal to null value if we specify the precision more than available precision in file. Below example give you details. I am not sure why its converting into Null. Note: You need to trim string before casting to decimal Table data with col1 and col2 columns val r =

Re: How to hold some data in memory while processing rows in a DataFrame?

2018-01-22 Thread naresh Goud
If I understand your requirement correct. Use broadcast variables to replicate across all nodes the small amount of data you wanted to reuse. On Mon, Jan 22, 2018 at 9:24 PM David Rosenstrauch wrote: > This seems like an easy thing to do, but I've been banging my head