Re: spark on kudu performance!

2018-07-05 Thread Todd Lipcon
On Mon, Jun 11, 2018 at 5:52 AM, fengba...@uce.cn wrote: > Hi: > > I use kudu official website development documents, use > spark analysis kudu data(kudu's version is 1.6.0): > > the official code is : > *val df = sqlContext.read.options(Map("kudu.master" -> > "kudu.master:7051","kudu.table"

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
Mike- I actually got a hold of the pid's for the spark executors but facing issues to run the jstack. There are some VM exceptions. I will figure it out and will attach the jstack. Thanks for your patience. On 6 March 2018 at 20:42, Mike Percy wrote: > Hmm, could you try in

Re: Spark Streaming + Kudu

2018-03-06 Thread Mike Percy
Hmm, could you try in spark local mode? i.e. https://jaceklaskowski. gitbooks.io/mastering-apache-spark/content/spark-local.html Mike On Tue, Mar 6, 2018 at 7:14 PM, Ravi Kanth wrote: > Mike, > > Can you clarify a bit on grabbing the jstack for the process? I launched

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
Mike, Can you clarify a bit on grabbing the jstack for the process? I launched my Spark application and tried to get the pid using which I thought I can grab jstack trace during hang. Unfortunately, I am not able to figure out grabbing pid for Spark application. Thanks, Ravi On 6 March 2018 at

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
Yes, I have debugged to find the root cause. Every logger before "table = client.openTable(tableName);" is executing fine and exactly at the point of opening the table, it is throwing the below exception and nothing is being executed after that. Still the Spark batches are being processed and at

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
Have you considered checking your session error count or pending errors in your while loop every so often? Can you identify where your code is hanging when the connection is lost (what line)? Mike On Mon, Mar 5, 2018 at 9:08 PM, Ravi Kanth wrote: > In addition to my

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
In addition to my previous comment, I raised a support ticket for this issue with Cloudera and one of the support person mentioned below, *"Thank you for clarifying, The exceptions are logged but not re-thrown to an upper layer, so that explains why the Spark application is not aware of the

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
Mike, Thanks for the information. But, once the connection to any of the Kudu servers is lost then there is no way I can have a control on the KuduSession object and so with getPendingErrors(). The KuduClient in this case is becoming a zombie and never returned back till the connection is

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
Hi Ravi, it would be helpful if you could attach what you are getting back from getPendingErrors() -- perhaps from dumping RowError.toString() from items in the returned array -- and indicate what you were hoping to get back. Note that a RowError can also return to you the Operation

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
Hi Mike, Thanks for the reply. Yes, I am using AUTO_FLUSH_BACKGROUND. So, I am trying to use Kudu Client API to perform UPSERT into Kudu and I integrated this with Spark. I am trying to test a case where in if any of Kudu server fails. So, in this case, if there is any problem in writing,

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
Hi Ravi, are you using AUTO_FLUSH_BACKGROUND ? You mention that you are trying to use getPendingErrors()

Re: Spark Streaming + Kudu

2018-02-26 Thread Ravi Kanth
Thank Clifford. We are running Kudu 1.4 version. Till date we didn't see any issues in production and we are not losing tablet servers. But, as part of testing I have to generate few unforeseen cases to analyse the application performance. One among that is bringing down the tablet server or

Re: Spark on Kudu Roadmap

2017-04-09 Thread Benjamin Kim
Hi Mike, Thanks for the link. I guess further, deeper Spark integration is slowly coming. But when, we will have to wait and see. Cheers, Ben > On Mar 27, 2017, at 12:25 PM, Mike Percy wrote: > > Hi Ben, > I don't really know so I'll let someone else more familiar with

Re: Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
Hi Mike, I believe what we are looking for is this below. It is an often request use case. Anyone know if the Spark package will ever allow for creating tables in Spark SQL? Such as: CREATE EXTERNAL TABLE USING org.apache.kudu.spark.kudu OPTIONS (Map("kudu.master" -> “",

Re: Spark on Kudu Roadmap

2017-03-27 Thread Mike Percy
Hi Ben, Is there anything in particular you are looking for? Thanks, Mike On Mon, Mar 27, 2017 at 9:48 AM, Benjamin Kim wrote: > Hi, > > Are there any plans for deeper integration with Spark especially Spark > SQL? Is there a roadmap to look at, so I can know what to expect

Re: Spark on Kudu

2016-10-10 Thread Mark Hamstra
I realize that the Spark on Kudu work to date has been based on Spark 1.6, where your statement about Spark SQL relying on Hive is true. In Spark 2.0, however, that dependency no longer exists since Spark SQL essentially copied over the parts of Hive that were needed into Spark itself, and has

Re: Spark on Kudu

2016-09-20 Thread Benjamin Kim
Thanks! > On Sep 20, 2016, at 3:02 PM, Jordan Birdsell > wrote: > > http://kudu.apache.org/docs/developing.html#_kudu_integration_with_spark > > > On Tue, Sep 20, 2016 at 5:00 PM Benjamin