from:"naresh Goud"

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

_2.11 > > > ...and it will be automatically fetched and used. > > Thanks, > Abhishek > > > On Sun, Jul 28, 2019 at 4:42 AM naresh Goud > wrote: > >> It looks there is some internal dependency missing. >> >> libraryDependencies ++= Seq( >> "c

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

/ On Sat, Jul 27, 2019 at 5:34 PM naresh Goud wrote: > Hi Abhishek, > > > We are not able to build jar using git hub code with below error? > > Any others able to build jars? Is there anything else missing? > > > > Note: Unresolved dependencies path: > [warn]

Re: New Spark Datasource for Hive ACID tables

2019-07-27 Thread naresh Goud

Hi Abhishek, We are not able to build jar using git hub code with below error? Any others able to build jars? Is there anything else missing? Note: Unresolved dependencies path: [warn] com.qubole:spark-acid-shaded-dependencies_2.11:0.1

Re: New Spark Datasource for Hive ACID tables

2019-07-26 Thread naresh Goud

Thanks Abhishek. Will it work on hive acid table which is not compacted ? i.e table having base and delta files? Let’s say hive acid table customer Create table customer(customer_id int, customer_name string, customer_email string) cluster by customer_id buckets 10 location ‘/test/customer’

Re: Spark SQL

2019-06-19 Thread naresh Goud

use the hive > parser or optimization engine. Instead it uses Catalyst, see > https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html > > On Mon, Jun 10, 2019 at 2:07 PM naresh Goud > wrote: > >> Hi Team, >> >> Is Spark Sql

Override jars in spark submit

2019-06-19 Thread naresh Goud

Hello All, How can we override jars in spark submit? We have hive-exec-spark jar which is available as part of default spark cluster jars. We wanted to override above mentioned jar in spark submit with latest version jar. How do we do that ? Thank you, Naresh -- Thanks, Naresh

Spark SQL

2019-06-10 Thread naresh Goud

Hi Team, Is Spark Sql uses hive engine to run queries ? My understanding that spark sql uses hive meta store to get metadata information to run queries. Thank you, Naresh -- Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/

Re: Subscribe Multiple Topics Structured Streaming

2018-09-17 Thread naresh Goud

You can have below statement for multiple topics val dfStatus = spark.readStream. format("kafka"). option("subscribe", "utility-status, utility-critical"). option("kafka.bootstrap.servers", "localhost:9092"). option("startingOffsets", "earliest") .load() On Mon,

Re: java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device

2018-08-19 Thread naresh Goud

Also check enough space available on /tmp directory On Fri, Aug 17, 2018 at 10:14 AM Jeevan K. Srivatsa < jeevansriva...@gmail.com> wrote: > Hi Venkata, > > On a quick glance, it looks like a file-related issue more so than an > executor issue. If the logs are not that important, I would clear >

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

2018-05-30 Thread naresh Goud

What are you doing? Give more details o what are you doing On Wed, May 30, 2018 at 12:58 PM Arun Hive wrote: > > Hi > > While running my spark job component i am getting the following exception. > Requesting for your help on this: > Spark core version - > spark-core_2.10-2.1.1 > > Spark

Re: ERROR: Hive on Spark

2018-04-16 Thread naresh Goud

Change you table name in query to spam.spamdataset instead of spamdataset. On Sun, Apr 15, 2018 at 2:12 PM Rishikesh Gawade wrote: > Hello there. I am a newbie in the world of Spark. I have been working on a > Spark Project using Java. > I have configured Hive and

Re: [Spark sql]: Re-execution of same operation takes less time than 1st

2018-04-03 Thread naresh Goud

Whenever spark read the data from it will have it in executor memory until and unless there is no room for new data read or processed. This is the beauty of spark. On Tue, Apr 3, 2018 at 12:42 AM snjv wrote: > Hi, > > When we execute the same operation twice, spark

Re: How does extending an existing parquet with columns affect impala/spark performance?

2018-04-03 Thread naresh Goud

>From spark point of view it shouldn’t effect. it’s possible to extend columns of new parquet files and it won’t affect Performance and not required to change spark application code. On Tue, Apr 3, 2018 at 9:14 AM Vitaliy Pisarev wrote: > This is not strictly a

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

2018-03-27 Thread naresh Goud

In case of storing as parquet file I don’t think it requires header. option("header","true") Give a try by removing header option and then try to read it. I haven’t tried. Just a thought. Thank you, Naresh On Tue, Mar 27, 2018 at 9:47 PM Mina Aslani wrote: > Hi, > > >

Re: is there a way to catch exceptions on executor level

2018-03-10 Thread naresh Goud

How about accumaltors? Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/ On Thu, Mar 8, 2018 at 12:07 AM Chethan Bhawarlal < cbhawar...@collectivei.com> wrote: > Hi Dev, > > I am doing spark operations on Rdd level for each row like this, > > private def

Re: Reading kafka and save to parquet problem

2018-03-07 Thread naresh Goud

change it to readStream instead of read as below val df = spark .readStream .format("kafka") .option("kafka.bootstrap.servers", "host1:port1,host2:port2") .option("subscribe", "topic1") .load() Check is this helpful

Re: How does Spark Structured Streaming determine an event has arrived late?

2018-02-27 Thread naresh Goud

Hi Kant, TD's explanation makes a lot sense. Refer this stackoverflow, where its was explained with program output. Hope this helps. https://stackoverflow.com/questions/45579100/structured-streaming-watermark-vs-exactly-once-semantics Thanks, Naresh www.linkedin.com/in/naresh-dulam

Re: Out of memory Error when using Collection Accumulator Spark 2.2

2018-02-26 Thread naresh Goud

what is your driver memory? Thanks, Naresh www.linkedin.com/in/naresh-dulam http://hadoopandspark.blogspot.com/ On Mon, Feb 26, 2018 at 3:45 AM, Patrick wrote: > Hi, > > We were getting OOM error when we are accumulating the results of each > worker. We were trying to

Re: partitionBy with partitioned column in output?

2018-02-26 Thread naresh Goud

is this helps? sc.parallelize(List((1,10),(2, 20))).toDF("foo","bar").map(("foo","bar")=>("foo",("foo","bar"))). partitionBy("foo").json("json-out") On Mon, Feb 26, 2018 at 4:28 PM, Alex Nastetsky wrote: > Is there a way to make outputs created with "partitionBy" to

Re: Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

2018-02-26 Thread naresh Goud

://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc5-docs/_site/structured-streaming-programming-guide.html#continuous-processing > > On Sun, Feb 25, 2018 at 12:26 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> Hello Spark Experts, >> >> What i

Re: Spark structured streaming: periodically refresh static data frame

2018-02-25 Thread naresh Goud

Appu, I am also landed in same problem. Are you able to solve this issue? Could you please share snippet of code if your able to do? Thanks, Naresh On Wed, Feb 14, 2018 at 8:04 PM, Tathagata Das wrote: > 1. Just loop like this. > > > def startQuery(): Streaming

Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

2018-02-25 Thread naresh Goud

Hello Spark Experts, What is the difference between Trigger.Continuous(10.seconds) and Trigger.ProcessingTime("10 seconds") ? Thank you, Naresh

Re: Consuming Data in Parallel using Spark Streaming

2018-02-22 Thread naresh Goud

entiate records of one type of entity from other type of >entities. > > > > -Beejal > > > > *From:* naresh Goud [mailto:nareshgoud.du...@gmail.com] > *Sent:* Friday, February 23, 2018 8:56 AM > *To:* Vibhakar, Beejal <beejal.vibha...@fisglobal.com> >

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud

; Regards, > Keith. > > http://keith-chapman.com > > On Thu, Feb 22, 2018 at 6:58 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> It would be very difficult to tell without knowing what is your >> application code doing, what kind of transformation/

Re: Spark not releasing shuffle files in time (with very large heap)

2018-02-22 Thread naresh Goud

It would be very difficult to tell without knowing what is your application code doing, what kind of transformation/actions performing. From my previous experience tuning application code which avoids unnecessary objects reduce pressure on GC. On Thu, Feb 22, 2018 at 2:13 AM, Keith Chapman

Re: Return statements aren't allowed in Spark closures

2018-02-22 Thread naresh Goud

Even i am not able to reproduce error On Thu, Feb 22, 2018 at 2:51 AM, Michael Artz wrote: > I am not able to reproduce your error. You should do something before you > do that last function and maybe get some more help from the exception it > returns. Like just add a

Re: KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud

ark/strea > ming/kafka010/KafkaUtils.scala > > FYI > > On Sun, Feb 18, 2018 at 5:17 PM, naresh Goud <nareshgoud.du...@gmail.com> > wrote: > >> Hello Team, >> >> I see "KafkaUtils.createStream() " method not available in spark 2.2.1. >> >>

KafkaUtils.createStream(..) is removed for API

2018-02-18 Thread naresh Goud

Hello Team, I see "KafkaUtils.createStream() " method not available in spark 2.2.1. Can someone please confirm if these methods are removed? below is my pom.xml entries. 2.11.8 2.11 org.apache.spark spark-streaming_${scala.tools.version} 2.2.1 provided

Re: Issue with Cast in Spark Sql

2018-01-30 Thread naresh Goud

Spark/Hive converting decimal to null value if we specify the precision more than available precision in file. Below example give you details. I am not sure why its converting into Null. Note: You need to trim string before casting to decimal Table data with col1 and col2 columns val r =

Re: How to hold some data in memory while processing rows in a DataFrame?

2018-01-22 Thread naresh Goud

If I understand your requirement correct. Use broadcast variables to replicate across all nodes the small amount of data you wanted to reuse. On Mon, Jan 22, 2018 at 9:24 PM David Rosenstrauch wrote: > This seems like an easy thing to do, but I've been banging my head

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: New Spark Datasource for Hive ACID tables

Re: Spark SQL

Override jars in spark submit

Spark SQL

Re: Subscribe Multiple Topics Structured Streaming

Re: java.nio.file.FileSystemException: /tmp/spark- .._cache : No space left on device

Re: Unable to alter partition. The transaction for alter partition did not commit successfully.

Re: ERROR: Hive on Spark

Re: [Spark sql]: Re-execution of same operation takes less time than 1st

Re: How does extending an existing parquet with columns affect impala/spark performance?

Re: java.lang.UnsupportedOperationException: CSV data source does not support struct/ERROR RetryingBlockFetcher

Re: is there a way to catch exceptions on executor level

Re: Reading kafka and save to parquet problem

Re: How does Spark Structured Streaming determine an event has arrived late?

Re: Out of memory Error when using Collection Accumulator Spark 2.2

Re: partitionBy with partitioned column in output?

Re: Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

Re: Spark structured streaming: periodically refresh static data frame

Trigger.ProcessingTime("10 seconds") & Trigger.Continuous(10.seconds)

Re: Consuming Data in Parallel using Spark Streaming

Re: Spark not releasing shuffle files in time (with very large heap)

Re: Spark not releasing shuffle files in time (with very large heap)

Re: Return statements aren't allowed in Spark closures

Re: KafkaUtils.createStream(..) is removed for API

KafkaUtils.createStream(..) is removed for API

Re: Issue with Cast in Spark Sql

Re: How to hold some data in memory while processing rows in a DataFrame?

30 matches

Site Navigation

Mail list logo

Footer information