Re: spark on kudu performance!

2018-07-05 Thread Todd Lipcon
On Mon, Jun 11, 2018 at 5:52 AM, fengba...@uce.cn wrote: > Hi: > > I use kudu official website development documents, use > spark analysis kudu data(kudu's version is 1.6.0): > > the official code is : > *val df = sqlContext.read.options(Map("kudu.master" ->

spark on kudu performance!

2018-06-11 Thread fengba...@uce.cn
Hi: I use kudu official website development documents, use spark analysis kudu data(kudu's version is 1.6.0): the official code is : val df = sqlContext.read.options(Map("kudu.master" -> "kudu.master:7051","kudu.table" -> "kudu_table")).kudu

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
row = upsert.getRow(); >>>>>>> for (Map.Entry<String, Object> entry : formattedMap.entrySet()) { >>>>>>> if (entry.getValue().getClass().equals(String.class)) { >>>>>>> if (entry.getValue().equals(SpecialNullConstants.specialStringNu

Re: Spark Streaming + Kudu

2018-03-06 Thread Mike Percy
try.getKey(), (String) entry.getValue()); >>>>>> } else if (entry.getValue().getClass().equals(Long.class)) { >>>>>> if (entry.getValue().equals(SpecialNullConstants.specialLongNull)) >>>>>> row.setNull(entry.getKey()); >>>>>>

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
; else >>>>> row.addInt(entry.getKey(), (Integer) entry.getValue()); >>>>> } >>>>> } >>>>> >>>>> session.apply(upsert); >>>>> } catch (Exception e) { >>>>> logger.error("Exception during u

Re: Spark Streaming + Kudu

2018-03-06 Thread Ravi Kanth
ncCache.containsKey(kuduMaster)) { >>> AsyncKuduClient asyncClient = new AsyncKuduClient.AsyncKuduClien >>> tBuilder(kuduMaster).build(); >>> ShutdownHookManager.get().addShutdownHook(new Runnable() { >>> @Override >>> public void run() { &g

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
; logger.error("Exception closing async client", e); >> } >> } >> }, ShutdownHookPriority); >> asyncCache.put(kuduMaster, asyncClient); >> } >> return asyncCache.get(kuduMaster); >> } >> } >> >> >> >> Thanks, &g

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
>> from items in the returned array -- and indicate what you were hoping to >> get back. Note that a RowError can also return to you the Operation >> <https://kudu.apache.org/releases/1.6.0/apidocs/org/apache/kudu/client/RowError.html#getOperation--> >> that you use

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
tps://kudu.apache.org/releases/1.6.0/apidocs/org/apache/kudu/client/RowError.html#getOperation--> > that you used to generate the write. From the Operation, you can get the > original PartialRow > <https://kudu.apache.org/releases/1.6.0/apidocs/org/apache/kudu/client/PartialRow.htm

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
ould be able to identify the affected row that the write failed for. Does that help? Since you are using the Kudu client directly, Spark is not involved from the Kudu perspective, so you will need to deal with Spark on your own in that case. Mike On Mon, Mar 5, 2018 at 1:59 PM, Ravi Kant

Re: Spark Streaming + Kudu

2018-03-05 Thread Ravi Kanth
Hi Mike, Thanks for the reply. Yes, I am using AUTO_FLUSH_BACKGROUND. So, I am trying to use Kudu Client API to perform UPSERT into Kudu and I integrated this with Spark. I am trying to test a case where in if any of Kudu server fails. So, in this case, if there is any problem in writing,

Re: Spark Streaming + Kudu

2018-03-05 Thread Mike Percy
Hi Ravi, are you using AUTO_FLUSH_BACKGROUND ? You mention that you are trying to use getPendingErrors()

Re: Spark Streaming + Kudu

2018-02-26 Thread Ravi Kanth
Thank Clifford. We are running Kudu 1.4 version. Till date we didn't see any issues in production and we are not losing tablet servers. But, as part of testing I have to generate few unforeseen cases to analyse the application performance. One among that is bringing down the tablet server or

Re: Spark on Kudu Roadmap

2017-04-09 Thread Benjamin Kim
Hi Mike, Thanks for the link. I guess further, deeper Spark integration is slowly coming. But when, we will have to wait and see. Cheers, Ben > On Mar 27, 2017, at 12:25 PM, Mike Percy wrote: > > Hi Ben, > I don't really know so I'll let someone else more familiar with

Re: Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
Hi Mike, I believe what we are looking for is this below. It is an often request use case. Anyone know if the Spark package will ever allow for creating tables in Spark SQL? Such as: CREATE EXTERNAL TABLE USING org.apache.kudu.spark.kudu OPTIONS (Map("kudu.master" -> “",

Re: Spark on Kudu Roadmap

2017-03-27 Thread Mike Percy
Hi Ben, Is there anything in particular you are looking for? Thanks, Mike On Mon, Mar 27, 2017 at 9:48 AM, Benjamin Kim wrote: > Hi, > > Are there any plans for deeper integration with Spark especially Spark > SQL? Is there a roadmap to look at, so I can know what to expect

Spark on Kudu Roadmap

2017-03-27 Thread Benjamin Kim
Hi, Are there any plans for deeper integration with Spark especially Spark SQL? Is there a roadmap to look at, so I can know what to expect in the future? Cheers, Ben

Re: Spark on Kudu

2016-10-10 Thread Mark Hamstra
I realize that the Spark on Kudu work to date has been based on Spark 1.6, where your statement about Spark SQL relying on Hive is true. In Spark 2.0, however, that dependency no longer exists since Spark SQL essentially copied over the parts of Hive that were needed into Spark itself, and has

Re: Spark on Kudu

2016-09-20 Thread Benjamin Kim
eady for production use, where do >> we find the spark connector jar for this release? >> >> >> It's available in the official ASF maven repository: >> https://repository.apache.org/#nexus-search;quick~kudu-spark >> <https://repository.apache.org/#nexus-