Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
I have 3 Spark Masters colocated with ZK's nodes and 2 Workers nodes. so my
NameNodes are the same nodes as my spark master and DataNodes are the same
Nodes as my Spark Workers. is that correct? How do I setup HDFS with
zookeeper?

On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra 
wrote:

> yes
>
> On Fri, Feb 3, 2017 at 10:08 PM, kant kodali  wrote:
>
>> can I use Spark Standalone with HDFS but no YARN?
>>
>> Thanks!
>>
>
>


Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra 
wrote:

> yes
>
> On Fri, Feb 3, 2017 at 10:08 PM, kant kodali  wrote:
>
>> can I use Spark Standalone with HDFS but no YARN?
>>
>> Thanks!
>>
>
>


Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread Mark Hamstra
yes

On Fri, Feb 3, 2017 at 10:08 PM, kant kodali  wrote:

> can I use Spark Standalone with HDFS but no YARN?
>
> Thanks!
>


can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
can I use Spark Standalone with HDFS but no YARN?

Thanks!


Re: How do I dynamically add nodes to spark standalone cluster and be able to discover them?

2017-02-03 Thread kant kodali
sorry I should just do this

./start-slave.sh spark://x.x.x.x:7077,y.y.y.y:7077,z.z.z.z:7077

but what about export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z"  ?  Dont
I need to have that on my worker node?

Thanks!



On Fri, Feb 3, 2017 at 4:57 PM, kant kodali  wrote:

> Hi,
>
> How do I start a slave? just run start-slave.sh script? but then I don't
> understand the following.
>
> I put the following in spark-env.sh in the worker machine
>
> export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z"
>
> but start-slave.sh doesn't seem to take SPARK_MASTER_HOST env variable. so
> I did the following
>
> ./start-slave.sh spark://x.x.x.x:7077 spark://y.y.y.y:7077
> spark://z.z.z.z:7077
>
> This didn't quite work either. any ideas?
>
> Thanks!
>
>
>
>
>
>
>
> On Wed, Jan 25, 2017 at 7:12 PM, Raghavendra Pandey <
> raghavendra.pan...@gmail.com> wrote:
>
>> When you start a slave you pass address of master as a parameter. That
>> slave will contact master and register itself.
>>
>> On Jan 25, 2017 4:12 AM, "kant kodali"  wrote:
>>
>>> Hi,
>>>
>>> How do I dynamically add nodes to spark standalone cluster and be able
>>> to discover them? Does Zookeeper do service discovery? What is the standard
>>> tool for these things?
>>>
>>> Thanks,
>>> kant
>>>
>>
>


Re: How do I dynamically add nodes to spark standalone cluster and be able to discover them?

2017-02-03 Thread kant kodali
Hi,

How do I start a slave? just run start-slave.sh script? but then I don't
understand the following.

I put the following in spark-env.sh in the worker machine

export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z"

but start-slave.sh doesn't seem to take SPARK_MASTER_HOST env variable. so
I did the following

./start-slave.sh spark://x.x.x.x:7077 spark://y.y.y.y:7077
spark://z.z.z.z:7077

This didn't quite work either. any ideas?

Thanks!







On Wed, Jan 25, 2017 at 7:12 PM, Raghavendra Pandey <
raghavendra.pan...@gmail.com> wrote:

> When you start a slave you pass address of master as a parameter. That
> slave will contact master and register itself.
>
> On Jan 25, 2017 4:12 AM, "kant kodali"  wrote:
>
>> Hi,
>>
>> How do I dynamically add nodes to spark standalone cluster and be able to
>> discover them? Does Zookeeper do service discovery? What is the standard
>> tool for these things?
>>
>> Thanks,
>> kant
>>
>


Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil
I may have found my problem. We have a scala wrapper on top of spark-submit
to run the shell command through scala.
We were kind of eating the exit code from spark-submit in that wrapper.
When I looked at what the actual exit code was stripping away the wrapper I
got 1.

So I think spark-submit is behaving okay in my case.

Thank you for all the help.

Thanks,
Shashank

On Fri, Feb 3, 2017 at 1:56 PM, Jacek Laskowski  wrote:

> Hi,
>
> ➜  spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
> Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/
> spark/whatever
> Run with --help for usage help or --verbose for debug output
> 1
>
> I see 1 and there are other cases for 1 too.
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Feb 3, 2017 at 10:46 PM, Ali Gouta  wrote:
> > Hello,
> >
> > +1, i have exactly the same issue. I need the exit code to make a
> decision
> > on oozie executing actions. Spark-submit always returns 0 when catching
> the
> > exception. From spark 1.5 to 1.6.x, i still have the same issue... It
> would
> > be great to fix it or to know if there is some work around about it.
> >
> > Ali Gouta.
> >
> > Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :
> >
> > Hi,
> >
> > An interesting case. You don't use Spark resources whatsoever.
> > Creating a SparkConf does not use YARN...yet. I think any run mode
> > would have the same effect. So, although spark-submit could have
> > returned exit code 1, the use case touches Spark very little.
> >
> > What version is that? Do you see "There is an exception in the script
> > exiting with status 1" printed out to stdout?
> >
> > Pozdrawiam,
> > Jacek Laskowski
> > 
> > https://medium.com/@jaceklaskowski/
> > Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> > Follow me at https://twitter.com/jaceklaskowski
> >
> >
> > On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
> >  wrote:
> >> Hi All,
> >>
> >> I wrote a test script which always throws an exception as below :
> >>
> >> object Test {
> >>
> >>
> >>   def main(args: Array[String]) {
> >> try {
> >>
> >>   val conf =
> >> new SparkConf()
> >>   .setAppName("Test")
> >>
> >>   throw new RuntimeException("Some Exception")
> >>
> >>   println("all done!")
> >> } catch {
> >>   case e: RuntimeException => {
> >> println("There is an exception in the script exiting with status
> >> 1")
> >> System.exit(1)
> >>   }
> >> }
> >> }
> >>
> >> When I run this code using spark-submit I am expecting to get an exit
> code
> >> of 1,
> >> however I keep getting exit code 0.
> >>
> >> Any ideas how I can force spark-submit to return with code 1 ?
> >>
> >> Thanks,
> >> Shashank
> >>
> >>
> >>
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
> >
>


Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Hollin Wilkins
Hey Asher,

A phone call may be the best to discuss all of this. But in short:
1. It is quite easy to add custom pipelines/models to MLeap. All of our
out-of-the-box transformers can serve as a good example of how to do this.
We are also putting together documentation on how to do this in our docs
web site.
2. MLlib models are not supported, but it wouldn't be too difficult to add
support for them
3. We have benchmarked this, and with MLeap it was roughly 2200x faster
than SparkContext with a LocalRelation-backed DataFrame. The pipeline we
used for benchmarking included string indexing, one hot encoding, vector
assembly, scaling and a linear regression model. The reason for the speed
difference is that MLeap is optimized for one off requests, Spark is
incredible for scoring large batches of data because it takes time to
optimize your pipeline before execution. That time it takes to optimize is
noticeable when trying to build services around models.
4. Tensorflow support is early, but we have already built pipelines
including a Spark pipeline and a Tensorflow neural network all served from
one MLeap pipeline, using the same data structures as you would with just a
regular Spark pipeline. Eventually we will offer Tensorflow support as a
module that *just works TM* from Maven Central, but we are not quite there
yet.

Feel free to email me privately if you would like to discuss any of this
more, or join our gitter:
https://gitter.im/combust/mleap

Best,
Hollin

On Fri, Feb 3, 2017 at 10:48 AM, Asher Krim  wrote:

> I have a bunch of questions for you Hollin:
>
> How easy is it to add support for custom pipelines/models?
> Are Spark mllib models supported?
> We currently run spark in local mode in an api service. It's not super
> terrible, but performance is a constant struggle. Have you benchmarked any
> performance differences between MLeap and vanilla Spark?
> What does Tensorflow support look like? I would love to serve models from
> a java stack while being agnostic to what framework was used to train them.
>
> Thanks,
> Asher Krim
> Senior Software Engineer
>
> On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins  wrote:
>
>> Hey Aseem,
>>
>> We have built pipelines that execute several string indexers, one hot
>> encoders, scaling, and a random forest or linear regression at the end.
>> Execution time for the linear regression was on the order of 11
>> microseconds, a bit longer for random forest. This can be further optimized
>> by using row-based transformations if your pipeline is simple to around 2-3
>> microseconds. The pipeline operated on roughly 12 input features, and by
>> the time all the processing was done, we had somewhere around 1000 features
>> or so going into the linear regression after one hot encoding and
>> everything else.
>>
>> Hope this helps,
>> Hollin
>>
>> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal 
>> wrote:
>>
>>> Does this support Java 7?
>>>
>>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal 
>>> wrote:
>>>
 Is computational time for predictions on the order of few milliseconds
 (< 10 ms) like the old mllib library?

 On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins 
 wrote:

> Hey everyone,
>
>
> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
> about MLeap and how you can use it to build production services from your
> Spark-trained ML pipelines. MLeap is an open-source technology that allows
> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
> Models to a scoring engine instantly. The MLeap execution engine has no
> dependencies on a Spark context and the serialization format is entirely
> based on Protobuf 3 and JSON.
>
>
> The recent 0.5.0 release provides serialization and inference support
> for close to 100% of Spark transformers (we don’t yet support ALS and 
> LDA).
>
>
> MLeap is open-source, take a look at our Github page:
>
> https://github.com/combust/mleap
>
>
> Or join the conversation on Gitter:
>
> https://gitter.im/combust/mleap
>
>
> We have a set of documentation to help get you started here:
>
> http://mleap-docs.combust.ml/
>
>
> We even have a set of demos, for training ML Pipelines and linear,
> logistic and random forest models:
>
> https://github.com/combust/mleap-demo
>
>
> Check out our latest MLeap-serving Docker image, which allows you to
> expose a REST interface to your Spark ML pipeline models:
>
> http://mleap-docs.combust.ml/mleap-serving/
>
>
> Several companies are using MLeap in production and even more are
> currently evaluating it. Take a look and tell us what you think! We hope 
> to
> talk with you soon and welcome feedback/suggestions!
>
>
> Sincerely,
>
> Hollin 

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski
Hi,

➜  spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/spark/whatever
Run with --help for usage help or --verbose for debug output
1

I see 1 and there are other cases for 1 too.

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 10:46 PM, Ali Gouta  wrote:
> Hello,
>
> +1, i have exactly the same issue. I need the exit code to make a decision
> on oozie executing actions. Spark-submit always returns 0 when catching the
> exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
> be great to fix it or to know if there is some work around about it.
>
> Ali Gouta.
>
> Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :
>
> Hi,
>
> An interesting case. You don't use Spark resources whatsoever.
> Creating a SparkConf does not use YARN...yet. I think any run mode
> would have the same effect. So, although spark-submit could have
> returned exit code 1, the use case touches Spark very little.
>
> What version is that? Do you see "There is an exception in the script
> exiting with status 1" printed out to stdout?
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
>  wrote:
>> Hi All,
>>
>> I wrote a test script which always throws an exception as below :
>>
>> object Test {
>>
>>
>>   def main(args: Array[String]) {
>> try {
>>
>>   val conf =
>> new SparkConf()
>>   .setAppName("Test")
>>
>>   throw new RuntimeException("Some Exception")
>>
>>   println("all done!")
>> } catch {
>>   case e: RuntimeException => {
>> println("There is an exception in the script exiting with status
>> 1")
>> System.exit(1)
>>   }
>> }
>> }
>>
>> When I run this code using spark-submit I am expecting to get an exit code
>> of 1,
>> however I keep getting exit code 0.
>>
>> Any ideas how I can force spark-submit to return with code 1 ?
>>
>> Thanks,
>> Shashank
>>
>>
>>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Ali Gouta
Hello,

+1, i have exactly the same issue. I need the exit code to make a decision
on oozie executing actions. Spark-submit always returns 0 when catching the
exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
be great to fix it or to know if there is some work around about it.

Ali Gouta.

Le 3 févr. 2017 22:24, "Jacek Laskowski"  a écrit :

Hi,

An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.

What version is that? Do you see "There is an exception in the script
exiting with status 1" printed out to stdout?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
 wrote:
> Hi All,
>
> I wrote a test script which always throws an exception as below :
>
> object Test {
>
>
>   def main(args: Array[String]) {
> try {
>
>   val conf =
> new SparkConf()
>   .setAppName("Test")
>
>   throw new RuntimeException("Some Exception")
>
>   println("all done!")
> } catch {
>   case e: RuntimeException => {
> println("There is an exception in the script exiting with status
1")
> System.exit(1)
>   }
> }
> }
>
> When I run this code using spark-submit I am expecting to get an exit code
> of 1,
> however I keep getting exit code 0.
>
> Any ideas how I can force spark-submit to return with code 1 ?
>
> Thanks,
> Shashank
>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski
Hi,

An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.

What version is that? Do you see "There is an exception in the script
exiting with status 1" printed out to stdout?

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 8:06 PM, Shashank Mandil
 wrote:
> Hi All,
>
> I wrote a test script which always throws an exception as below :
>
> object Test {
>
>
>   def main(args: Array[String]) {
> try {
>
>   val conf =
> new SparkConf()
>   .setAppName("Test")
>
>   throw new RuntimeException("Some Exception")
>
>   println("all done!")
> } catch {
>   case e: RuntimeException => {
> println("There is an exception in the script exiting with status 1")
> System.exit(1)
>   }
> }
> }
>
> When I run this code using spark-submit I am expecting to get an exit code
> of 1,
> however I keep getting exit code 0.
>
> Any ideas how I can force spark-submit to return with code 1 ?
>
> Thanks,
> Shashank
>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: sqlContext vs spark.

2017-02-03 Thread Jacek Laskowski
Hi,

Yes. Forget about SQLContext. It's been merged into SparkSession as of
Spark 2.0 (same about HiveContext).

Long live SparkSession! :-)

Jacek


On 3 Feb 2017 7:48 p.m., "☼ R Nair (रविशंकर नायर)" <
ravishankar.n...@gmail.com> wrote:

All,

In Spark 1.6.0, we used

val jdbcDF = sqlContext.read.format(-)

for creating a data frame through hsbc.

In Spark 2.1.x, we have seen this is
val jdbcDF = *spark*.read.format(-)

Does that mean we should not be using sqlContext going forward? Also, we
see that sqlContext is not auto initialized while running spark-shell.
Please advise, thanks

Best, Ravion


Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

I found a profile for Spark 2.11 and removed it. Now, it brings in 2.10. I ran 
some code and got further. Now, I get this error below when I do a “df.show”.

java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.log(HBaseFilter.scala:122)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseFilter$.buildFilters(HBaseFilter.scala:125)
at 
org.apache.spark.sql.execution.datasources.hbase.HBaseTableScanRDD.getPartitions(HBaseTableScan.scala:59)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)

Thanks for all your help.

Cheers,
Ben


> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
> 
> Did you check the actual maven dep tree? Something might be pulling in a 
> different version. Also, if you're seeing this locally, you might want to 
> check which version of the scala sdk your IDE is using
> 
> Asher Krim
> Senior Software Engineer
> 
> 
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  > wrote:
> Hi Asher,
> 
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
> (1.8) version as our installation. The Scala (2.10.5) version is already the 
> same as ours. But I’m still getting the same error. Can you think of anything 
> else?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 2, 2017, at 11:06 AM, Asher Krim > > wrote:
>> 
>> Ben,
>> 
>> That looks like a scala version mismatch. Have you checked your dep tree?
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim > > wrote:
>> Elek,
>> 
>> Can you give me some sample code? I can’t get mine to work.
>> 
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test", 
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>> 
>> import sqlContext.implicits._
>> 
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>> 
>> val df = withCatalog(cat)
>> df.show
>> 
>> It gives me this error.
>> 
>> java.lang.NoSuchMethodError: 
>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>  at 
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> 
>> If you can please help, I would be grateful.
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek >> > wrote:
>>> 
>>> 
>>> I tested this one with hbase 1.2.4:
>>> 
>>> https://github.com/hortonworks-spark/shc 
>>> 
>>> 
>>> Marton
>>> 
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
 Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
 tried to build it from source, but I cannot get it to work.
 
 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> 
>>> 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-03 Thread Asher Krim
You can see in the tree what's pulling in 2.11. Your option then will be to
either shade them and add an explicit dependency on 2.10.5 in your pom.
Alternatively, you can explore upgrading your project to 2.11 (which will
require using a 2_11 build of spark)


On Fri, Feb 3, 2017 at 2:03 PM, Benjamin Kim  wrote:

> Asher,
>
> You’re right. I don’t see anything but 2.11 being pulled in. Do you know
> where I can change this?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017, at 10:50 AM, Asher Krim  wrote:
>
> Sorry for my persistence, but did you actually run "mvn dependency:tree
> -Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
>
> On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  wrote:
>
>> Asher,
>>
>> It’s still the same. Do you have any other ideas?
>>
>> Cheers,
>> Ben
>>
>>
>> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
>>
>> Did you check the actual maven dep tree? Something might be pulling in a
>> different version. Also, if you're seeing this locally, you might want to
>> check which version of the scala sdk your IDE is using
>>
>> Asher Krim
>> Senior Software Engineer
>>
>> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>>
>>> Hi Asher,
>>>
>>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
>>> (1.8) version as our installation. The Scala (2.10.5) version is already
>>> the same as ours. But I’m still getting the same error. Can you think of
>>> anything else?
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>>>
>>> Ben,
>>>
>>> That looks like a scala version mismatch. Have you checked your dep tree?
>>>
>>> Asher Krim
>>> Senior Software Engineer
>>>
>>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>>>
 Elek,

 Can you give me some sample code? I can’t get mine to work.

 import org.apache.spark.sql.{SQLContext, _}
 import org.apache.spark.sql.execution.datasources.hbase._
 import org.apache.spark.{SparkConf, SparkContext}

 def cat = s"""{
 |"table":{"namespace":"ben", "name":"dmp_test",
 "tableCoder":"PrimitiveType"},
 |"rowkey":"key",
 |"columns":{
 |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
 |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
 |}
 |}""".stripMargin

 import sqlContext.implicits._

 def withCatalog(cat: String): DataFrame = {
 sqlContext
 .read
 .options(Map(HBaseTableCatalog.tableCatalog->cat))
 .format("org.apache.spark.sql.execution.datasources.hbase")
 .load()
 }

 val df = withCatalog(cat)
 df.show


 It gives me this error.

 java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
 (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
 at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
 atalog$.apply(HBaseTableCatalog.scala:232)
 at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
 on.(HBaseRelation.scala:77)
 at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
 ce.createRelation(HBaseRelation.scala:51)
 at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
 e$.apply(ResolvedDataSource.scala:158)
 at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)


 If you can please help, I would be grateful.

 Cheers,
 Ben


 On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:


 I tested this one with hbase 1.2.4:

 https://github.com/hortonworks-spark/shc

 Marton

 On 01/31/2017 09:17 PM, Benjamin Kim wrote:

 Does anyone know how to backport the HBase Spark module to HBase 1.2.0?
 I tried to build it from source, but I cannot get it to work.

 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org


 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org



>>>
>>>
>>
>>
>
>


Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil
Hi All,

I wrote a test script which always throws an exception as below :

object Test {


  def main(args: Array[String]) {
try {

  val conf =
new SparkConf()
  .setAppName("Test")

  throw new RuntimeException("Some Exception")

  println("all done!")
} catch {
  case e: RuntimeException => {
println("There is an exception in the script exiting with status 1")
System.exit(1)
  }
}
}

When I run this code using spark-submit I am expecting to get an exit code
of 1,
however I keep getting exit code 0.

Any ideas how I can force spark-submit to return with code 1 ?

Thanks,
Shashank


Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

You’re right. I don’t see anything but 2.11 being pulled in. Do you know where 
I can change this?

Cheers,
Ben


> On Feb 3, 2017, at 10:50 AM, Asher Krim  wrote:
> 
> Sorry for my persistence, but did you actually run "mvn dependency:tree 
> -Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
> 
> On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  > wrote:
> Asher,
> 
> It’s still the same. Do you have any other ideas?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 3, 2017, at 8:16 AM, Asher Krim > > wrote:
>> 
>> Did you check the actual maven dep tree? Something might be pulling in a 
>> different version. Also, if you're seeing this locally, you might want to 
>> check which version of the scala sdk your IDE is using
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim > > wrote:
>> Hi Asher,
>> 
>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
>> (1.8) version as our installation. The Scala (2.10.5) version is already the 
>> same as ours. But I’m still getting the same error. Can you think of 
>> anything else?
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Feb 2, 2017, at 11:06 AM, Asher Krim >> > wrote:
>>> 
>>> Ben,
>>> 
>>> That looks like a scala version mismatch. Have you checked your dep tree?
>>> 
>>> Asher Krim
>>> Senior Software Engineer
>>> 
>>> 
>>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim >> > wrote:
>>> Elek,
>>> 
>>> Can you give me some sample code? I can’t get mine to work.
>>> 
>>> import org.apache.spark.sql.{SQLContext, _}
>>> import org.apache.spark.sql.execution.datasources.hbase._
>>> import org.apache.spark.{SparkConf, SparkContext}
>>> 
>>> def cat = s"""{
>>> |"table":{"namespace":"ben", "name":"dmp_test", 
>>> "tableCoder":"PrimitiveType"},
>>> |"rowkey":"key",
>>> |"columns":{
>>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>>> |}
>>> |}""".stripMargin
>>> 
>>> import sqlContext.implicits._
>>> 
>>> def withCatalog(cat: String): DataFrame = {
>>> sqlContext
>>> .read
>>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>>> .format("org.apache.spark.sql.execution.datasources.hbase")
>>> .load()
>>> }
>>> 
>>> val df = withCatalog(cat)
>>> df.show
>>> 
>>> It gives me this error.
>>> 
>>> java.lang.NoSuchMethodError: 
>>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>> at 
>>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>> at 
>>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>> 
>>> If you can please help, I would be grateful.
>>> 
>>> Cheers,
>>> Ben
>>> 
>>> 
 On Jan 31, 2017, at 1:02 PM, Marton, Elek > wrote:
 
 
 I tested this one with hbase 1.2.4:
 
 https://github.com/hortonworks-spark/shc 
 
 
 Marton
 
 On 01/31/2017 09:17 PM, Benjamin Kim wrote:
> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
> tried to build it from source, but I cannot get it to work.
> 
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
 
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> 
>> 
>> 
> 
> 



Re: NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job

2017-02-03 Thread Anastasios Zouzias
Hi there,

Are you sure that the cluster nodes where the executors run have network
connectivity to the elastic cluster?

Speaking of which, why don't you use:
https://github.com/elastic/elasticsearch-hadoop#apache-spark ?

Cheers,
Anastasios

On Fri, Feb 3, 2017 at 7:10 PM, Dmitry Goldenberg 
wrote:

> Hi,
>
> Any reason why we might be getting this error?  The code seems to work
> fine in the non-distributed mode but the same code when run from a Spark
> job is not able to get to Elastic.
>
> Spark version: 2.0.1 built for Hadoop 2.4, Scala 2.11
> Elastic version: 2.3.1
>
> I've verified the Elastic hosts and the cluster name.
>
> The spot in the code where this happens is:
>
>  ClusterHealthResponse clusterHealthResponse = client.admin().cluster()
>
>   .prepareHealth()
>
>   .setWaitForGreenStatus()
>
>   .setTimeout(TimeValue.*timeValueSeconds*(10))
>
>   .get();
>
>
> Stack trace:
>
>
> Driver stacktrace:
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$
> scheduler$DAGScheduler$$failJobAndIndependentStages(DAGSched
> uler.scala:1454)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
> 1.apply(DAGScheduler.scala:1442)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$
> 1.apply(DAGScheduler.scala:1441)
>
> at scala.collection.mutable.ResizableArray$class.foreach(Resiza
> bleArray.scala:59)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.
> scala:48)
>
> at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu
> ler.scala:1441)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
> etFailed$1.apply(DAGScheduler.scala:811)
>
> at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS
> etFailed$1.apply(DAGScheduler.scala:811)
>
> at scala.Option.foreach(Option.scala:257)
>
> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
> DAGScheduler.scala:811)
>
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOn
> Receive(DAGScheduler.scala:1667)
>
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> ceive(DAGScheduler.scala:1622)
>
> at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onRe
> ceive(DAGScheduler.scala:1611)
>
> at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>
> at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.
> scala:632)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916)
>
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:1930)
>
> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(
> RDD.scala:902)
>
> at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(
> RDD.scala:900)
>
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> onScope.scala:151)
>
> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
> onScope.scala:112)
>
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
>
> at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)
>
> at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition
> (JavaRDDLike.scala:218)
>
> at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartiti
> on(JavaRDDLike.scala:45)
>
> at com.myco.MyDriver$3.call(com.myco.MyDriver.java:214)
>
> at com.myco.MyDriver$3.call(KafkaSparkStreamingDriver.java:201)
>
> at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun
> $foreachRDD$1.apply(JavaDStreamLike.scala:272)
>
> at org.apache.spark.streaming.api.java.JavaDStreamLike$$anonfun
> $foreachRDD$1.apply(JavaDStreamLike.scala:272)
>
> at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachR
> DD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
>
> at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachR
> DD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
>
> at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$
> 1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
>
> at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$
> 1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
>
> at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$
> 1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
>
> at org.apache.spark.streaming.dstream.DStream.createRDDWithLoca
> lProperties(DStream.scala:415)
>
> at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$
> 1.apply$mcV$sp(ForEachDStream.scala:50)
>
> at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$
> 1.apply(ForEachDStream.scala:50)

Re: HBase Spark

2017-02-03 Thread Asher Krim
Sorry for my persistence, but did you actually run "mvn dependency:tree
-Dverbose=true"? And did you see only scala 2.10.5 being pulled in?

On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim  wrote:

> Asher,
>
> It’s still the same. Do you have any other ideas?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
>
> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>
>> Hi Asher,
>>
>> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
>> (1.8) version as our installation. The Scala (2.10.5) version is already
>> the same as ours. But I’m still getting the same error. Can you think of
>> anything else?
>>
>> Cheers,
>> Ben
>>
>>
>> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>>
>> Ben,
>>
>> That looks like a scala version mismatch. Have you checked your dep tree?
>>
>> Asher Krim
>> Senior Software Engineer
>>
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>>
>>> Elek,
>>>
>>> Can you give me some sample code? I can’t get mine to work.
>>>
>>> import org.apache.spark.sql.{SQLContext, _}
>>> import org.apache.spark.sql.execution.datasources.hbase._
>>> import org.apache.spark.{SparkConf, SparkContext}
>>>
>>> def cat = s"""{
>>> |"table":{"namespace":"ben", "name":"dmp_test",
>>> "tableCoder":"PrimitiveType"},
>>> |"rowkey":"key",
>>> |"columns":{
>>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>>> |}
>>> |}""".stripMargin
>>>
>>> import sqlContext.implicits._
>>>
>>> def withCatalog(cat: String): DataFrame = {
>>> sqlContext
>>> .read
>>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>>> .format("org.apache.spark.sql.execution.datasources.hbase")
>>> .load()
>>> }
>>>
>>> val df = withCatalog(cat)
>>> df.show
>>>
>>>
>>> It gives me this error.
>>>
>>> java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
>>> (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>> at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
>>> atalog$.apply(HBaseTableCatalog.scala:232)
>>> at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
>>> on.(HBaseRelation.scala:77)
>>> at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
>>> ce.createRelation(HBaseRelation.scala:51)
>>> at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
>>> e$.apply(ResolvedDataSource.scala:158)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>>
>>>
>>> If you can please help, I would be grateful.
>>>
>>> Cheers,
>>> Ben
>>>
>>>
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>>>
>>>
>>> I tested this one with hbase 1.2.4:
>>>
>>> https://github.com/hortonworks-spark/shc
>>>
>>> Marton
>>>
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>>>
>>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0?
>>> I tried to build it from source, but I cannot get it to work.
>>>
>>> Thanks,
>>> Ben
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>
>>
>
>


sqlContext vs spark.

2017-02-03 Thread रविशंकर नायर
All,

In Spark 1.6.0, we used

val jdbcDF = sqlContext.read.format(-)

for creating a data frame through hsbc.

In Spark 2.1.x, we have seen this is
val jdbcDF = *spark*.read.format(-)

Does that mean we should not be using sqlContext going forward? Also, we
see that sqlContext is not auto initialized while running spark-shell.
Please advise, thanks

Best, Ravion


Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Asher Krim
I have a bunch of questions for you Hollin:

How easy is it to add support for custom pipelines/models?
Are Spark mllib models supported?
We currently run spark in local mode in an api service. It's not super
terrible, but performance is a constant struggle. Have you benchmarked any
performance differences between MLeap and vanilla Spark?
What does Tensorflow support look like? I would love to serve models from a
java stack while being agnostic to what framework was used to train them.

Thanks,
Asher Krim
Senior Software Engineer

On Fri, Feb 3, 2017 at 11:53 AM, Hollin Wilkins  wrote:

> Hey Aseem,
>
> We have built pipelines that execute several string indexers, one hot
> encoders, scaling, and a random forest or linear regression at the end.
> Execution time for the linear regression was on the order of 11
> microseconds, a bit longer for random forest. This can be further optimized
> by using row-based transformations if your pipeline is simple to around 2-3
> microseconds. The pipeline operated on roughly 12 input features, and by
> the time all the processing was done, we had somewhere around 1000 features
> or so going into the linear regression after one hot encoding and
> everything else.
>
> Hope this helps,
> Hollin
>
> On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal  wrote:
>
>> Does this support Java 7?
>>
>> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal 
>> wrote:
>>
>>> Is computational time for predictions on the order of few milliseconds
>>> (< 10 ms) like the old mllib library?
>>>
>>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins 
>>> wrote:
>>>
 Hey everyone,


 Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
 about MLeap and how you can use it to build production services from your
 Spark-trained ML pipelines. MLeap is an open-source technology that allows
 Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
 Models to a scoring engine instantly. The MLeap execution engine has no
 dependencies on a Spark context and the serialization format is entirely
 based on Protobuf 3 and JSON.


 The recent 0.5.0 release provides serialization and inference support
 for close to 100% of Spark transformers (we don’t yet support ALS and LDA).


 MLeap is open-source, take a look at our Github page:

 https://github.com/combust/mleap


 Or join the conversation on Gitter:

 https://gitter.im/combust/mleap


 We have a set of documentation to help get you started here:

 http://mleap-docs.combust.ml/


 We even have a set of demos, for training ML Pipelines and linear,
 logistic and random forest models:

 https://github.com/combust/mleap-demo


 Check out our latest MLeap-serving Docker image, which allows you to
 expose a REST interface to your Spark ML pipeline models:

 http://mleap-docs.combust.ml/mleap-serving/


 Several companies are using MLeap in production and even more are
 currently evaluating it. Take a look and tell us what you think! We hope to
 talk with you soon and welcome feedback/suggestions!


 Sincerely,

 Hollin and Mikhail

>>>
>>>
>>
>


Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Thanks Fernando.  But I need to have only 1 row for a given user, date with
very low latency. So none of your options work for me.



On Fri, Feb 3, 2017 at 10:34 AM, Fernando Avalos  wrote:

> Hi Shyla,
>
> Maybe I am wrong, but I can see two options here.
>
> 1.- Do some grouping before insert to Cassandra.
> 2.- Insert to cassandra all the entries and add some logic to your
> request to get the most recent.
>
> Regards,
>
> 2017-02-03 10:26 GMT-08:00 shyla deshpande :
> > Hi All,
> >
> > I wanted to add more info ..
> > The first column is the user and the third is the period. and my key is
> > (userid, date) For a given user and date combination I want to see only 1
> > row. My problem is that PT0H10M0S is overwritten by PT0H9M30S, even
> though
> > the order of the rows in the RDD is PT0H9M30S and then PT0H10M0S.
> >
> > Appreciate your input. Thanks
> >
> > On Fri, Feb 3, 2017 at 12:45 AM, shyla deshpande <
> deshpandesh...@gmail.com>
> > wrote:
> >>
> >> Hello All,
> >>
> >> This is the content of my RDD which I am saving to Cassandra table.
> >>
> >> But looks like the 2nd row is written first and then the first row
> >> overwrites it. So I end up with bad output.
> >>
> >> (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY)
> >> (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H10M0S, WEDNESDAY)
> >>
> >> Is there a way to force the order of the rows written to Cassandra.
> >>
> >> Please help.
> >>
> >> Thanks
> >
> >
>


Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Hi All,

I wanted to add more info ..
The first column is the user and the third is the period. and my key is
(userid, date) For a given user and date combination I want to see only 1
row. My problem is that PT0H10M0S is overwritten by PT0H9M30S, even though
the order of the rows in the RDD is PT0H9M30S and then PT0H10M0S.

Appreciate your input. Thanks

On Fri, Feb 3, 2017 at 12:45 AM, shyla deshpande 
wrote:

> Hello All,
>
> This is the content of my RDD which I am saving to Cassandra table.
>
> But looks like the 2nd row is written first and then the first row overwrites 
> it. So I end up with bad output.
>
> (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY)
> (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H10M0S, WEDNESDAY)
>
> Is there a way to force the order of the rows written to Cassandra.
>
> Please help.
>
> Thanks
>
>


NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job

2017-02-03 Thread Dmitry Goldenberg
Hi,

Any reason why we might be getting this error?  The code seems to work fine
in the non-distributed mode but the same code when run from a Spark job is
not able to get to Elastic.

Spark version: 2.0.1 built for Hadoop 2.4, Scala 2.11
Elastic version: 2.3.1

I've verified the Elastic hosts and the cluster name.

The spot in the code where this happens is:

 ClusterHealthResponse clusterHealthResponse = client.admin().cluster()

  .prepareHealth()

  .setWaitForGreenStatus()

  .setTimeout(TimeValue.*timeValueSeconds*(10))

  .get();


Stack trace:


Driver stacktrace:

at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$
scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1454)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$
abortStage$1.apply(DAGScheduler.scala:1442)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$
abortStage$1.apply(DAGScheduler.scala:1441)

at scala.collection.mutable.ResizableArray$class.foreach(
ResizableArray.scala:59)

at scala.collection.mutable.ArrayBuffer.foreach(
ArrayBuffer.scala:48)

at org.apache.spark.scheduler.DAGScheduler.abortStage(
DAGScheduler.scala:1441)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$
handleTaskSetFailed$1.apply(DAGScheduler.scala:811)

at org.apache.spark.scheduler.DAGScheduler$$anonfun$
handleTaskSetFailed$1.apply(DAGScheduler.scala:811)

at scala.Option.foreach(Option.scala:257)

at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
DAGScheduler.scala:811)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
doOnReceive(DAGScheduler.scala:1667)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
onReceive(DAGScheduler.scala:1622)

at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.
onReceive(DAGScheduler.scala:1611)

at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)

at org.apache.spark.scheduler.DAGScheduler.runJob(
DAGScheduler.scala:632)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1890)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1903)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1916)

at org.apache.spark.SparkContext.runJob(SparkContext.scala:1930)

at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.
apply(RDD.scala:902)

at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.
apply(RDD.scala:900)

at org.apache.spark.rdd.RDDOperationScope$.withScope(
RDDOperationScope.scala:151)

at org.apache.spark.rdd.RDDOperationScope$.withScope(
RDDOperationScope.scala:112)

at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)

at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:900)

at org.apache.spark.api.java.JavaRDDLike$class.
foreachPartition(JavaRDDLike.scala:218)

at org.apache.spark.api.java.AbstractJavaRDDLike.
foreachPartition(JavaRDDLike.scala:45)

at com.myco.MyDriver$3.call(com.myco.MyDriver.java:214)

at com.myco.MyDriver$3.call(KafkaSparkStreamingDriver.java:201)

at org.apache.spark.streaming.api.java.JavaDStreamLike$$
anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)

at org.apache.spark.streaming.api.java.JavaDStreamLike$$
anonfun$foreachRDD$1.apply(JavaDStreamLike.scala:272)

at org.apache.spark.streaming.dstream.DStream$$anonfun$
foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)

at org.apache.spark.streaming.dstream.DStream$$anonfun$
foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)

at org.apache.spark.streaming.dstream.DStream.
createRDDWithLocalProperties(DStream.scala:415)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1.apply(ForEachDStream.scala:50)

at org.apache.spark.streaming.dstream.ForEachDStream$$
anonfun$1.apply(ForEachDStream.scala:50)

at scala.util.Try$.apply(Try.scala:192)

at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)

at org.apache.spark.streaming.scheduler.JobScheduler$
JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:247)

at org.apache.spark.streaming.scheduler.JobScheduler$
JobHandler$$anonfun$run$1.apply(JobScheduler.scala:247)

at org.apache.spark.streaming.scheduler.JobScheduler$

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher,

It’s still the same. Do you have any other ideas?

Cheers,
Ben


> On Feb 3, 2017, at 8:16 AM, Asher Krim  wrote:
> 
> Did you check the actual maven dep tree? Something might be pulling in a 
> different version. Also, if you're seeing this locally, you might want to 
> check which version of the scala sdk your IDE is using
> 
> Asher Krim
> Senior Software Engineer
> 
> 
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  > wrote:
> Hi Asher,
> 
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java 
> (1.8) version as our installation. The Scala (2.10.5) version is already the 
> same as ours. But I’m still getting the same error. Can you think of anything 
> else?
> 
> Cheers,
> Ben
> 
> 
>> On Feb 2, 2017, at 11:06 AM, Asher Krim > > wrote:
>> 
>> Ben,
>> 
>> That looks like a scala version mismatch. Have you checked your dep tree?
>> 
>> Asher Krim
>> Senior Software Engineer
>> 
>> 
>> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim > > wrote:
>> Elek,
>> 
>> Can you give me some sample code? I can’t get mine to work.
>> 
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test", 
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>> 
>> import sqlContext.implicits._
>> 
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>> 
>> val df = withCatalog(cat)
>> df.show
>> 
>> It gives me this error.
>> 
>> java.lang.NoSuchMethodError: 
>> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
>>  at 
>> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
>>  at 
>> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
>>  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>> 
>> If you can please help, I would be grateful.
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Jan 31, 2017, at 1:02 PM, Marton, Elek >> > wrote:
>>> 
>>> 
>>> I tested this one with hbase 1.2.4:
>>> 
>>> https://github.com/hortonworks-spark/shc 
>>> 
>>> 
>>> Marton
>>> 
>>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
 Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I 
 tried to build it from source, but I cannot get it to work.
 
 Thanks,
 Ben
 -
 To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
 
 
>>> 
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>>> 
>>> 
>> 
>> 
> 
> 



Re: HBase Spark

2017-02-03 Thread Benjamin Kim
I'll clean up any .m2 or .ivy directories. And try again.

I ran this on our lab cluster for testing.

Cheers,
Ben


On Fri, Feb 3, 2017 at 8:16 AM Asher Krim  wrote:

> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version of the scala sdk your IDE is using
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:
>
> Hi Asher,
>
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
> (1.8) version as our installation. The Scala (2.10.5) version is already
> the same as ours. But I’m still getting the same error. Can you think of
> anything else?
>
> Cheers,
> Ben
>
>
> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>
> Ben,
>
> That looks like a scala version mismatch. Have you checked your dep tree?
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>
> Elek,
>
> Can you give me some sample code? I can’t get mine to work.
>
> import org.apache.spark.sql.{SQLContext, _}
> import org.apache.spark.sql.execution.datasources.hbase._
> import org.apache.spark.{SparkConf, SparkContext}
>
> def cat = s"""{
> |"table":{"namespace":"ben", "name":"dmp_test",
> "tableCoder":"PrimitiveType"},
> |"rowkey":"key",
> |"columns":{
> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
> |}
> |}""".stripMargin
>
> import sqlContext.implicits._
>
> def withCatalog(cat: String): DataFrame = {
> sqlContext
> .read
> .options(Map(HBaseTableCatalog.tableCatalog->cat))
> .format("org.apache.spark.sql.execution.datasources.hbase")
> .load()
> }
>
> val df = withCatalog(cat)
> df.show
>
>
> It gives me this error.
>
> java.lang.NoSuchMethodError:
> scala.runtime.ObjectRef.create(Ljava/lang/Object;)Lscala/runtime/ObjectRef;
> at
> org.apache.spark.sql.execution.datasources.hbase.HBaseTableCatalog$.apply(HBaseTableCatalog.scala:232)
> at
> org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.(HBaseRelation.scala:77)
> at
> org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:51)
> at
> org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>
>
> If you can please help, I would be grateful.
>
> Cheers,
> Ben
>
>
> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>
>
> I tested this one with hbase 1.2.4:
>
> https://github.com/hortonworks-spark/shc
>
> Marton
>
> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>
> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I
> tried to build it from source, but I cannot get it to work.
>
> Thanks,
> Ben
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>
>
>
>
>


Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Hollin Wilkins
Hey Aseem,

We have built pipelines that execute several string indexers, one hot
encoders, scaling, and a random forest or linear regression at the end.
Execution time for the linear regression was on the order of 11
microseconds, a bit longer for random forest. This can be further optimized
by using row-based transformations if your pipeline is simple to around 2-3
microseconds. The pipeline operated on roughly 12 input features, and by
the time all the processing was done, we had somewhere around 1000 features
or so going into the linear regression after one hot encoding and
everything else.

Hope this helps,
Hollin

On Fri, Feb 3, 2017 at 4:05 AM, Aseem Bansal  wrote:

> Does this support Java 7?
>
> On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal  wrote:
>
>> Is computational time for predictions on the order of few milliseconds (<
>> 10 ms) like the old mllib library?
>>
>> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins 
>> wrote:
>>
>>> Hey everyone,
>>>
>>>
>>> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
>>> about MLeap and how you can use it to build production services from your
>>> Spark-trained ML pipelines. MLeap is an open-source technology that allows
>>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
>>> Models to a scoring engine instantly. The MLeap execution engine has no
>>> dependencies on a Spark context and the serialization format is entirely
>>> based on Protobuf 3 and JSON.
>>>
>>>
>>> The recent 0.5.0 release provides serialization and inference support
>>> for close to 100% of Spark transformers (we don’t yet support ALS and LDA).
>>>
>>>
>>> MLeap is open-source, take a look at our Github page:
>>>
>>> https://github.com/combust/mleap
>>>
>>>
>>> Or join the conversation on Gitter:
>>>
>>> https://gitter.im/combust/mleap
>>>
>>>
>>> We have a set of documentation to help get you started here:
>>>
>>> http://mleap-docs.combust.ml/
>>>
>>>
>>> We even have a set of demos, for training ML Pipelines and linear,
>>> logistic and random forest models:
>>>
>>> https://github.com/combust/mleap-demo
>>>
>>>
>>> Check out our latest MLeap-serving Docker image, which allows you to
>>> expose a REST interface to your Spark ML pipeline models:
>>>
>>> http://mleap-docs.combust.ml/mleap-serving/
>>>
>>>
>>> Several companies are using MLeap in production and even more are
>>> currently evaluating it. Take a look and tell us what you think! We hope to
>>> talk with you soon and welcome feedback/suggestions!
>>>
>>>
>>> Sincerely,
>>>
>>> Hollin and Mikhail
>>>
>>
>>
>


Re: HBase Spark

2017-02-03 Thread Asher Krim
Did you check the actual maven dep tree? Something might be pulling in a
different version. Also, if you're seeing this locally, you might want to
check which version of the scala sdk your IDE is using

Asher Krim
Senior Software Engineer

On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim  wrote:

> Hi Asher,
>
> I modified the pom to be the same Spark (1.6.0), HBase (1.2.0), and Java
> (1.8) version as our installation. The Scala (2.10.5) version is already
> the same as ours. But I’m still getting the same error. Can you think of
> anything else?
>
> Cheers,
> Ben
>
>
> On Feb 2, 2017, at 11:06 AM, Asher Krim  wrote:
>
> Ben,
>
> That looks like a scala version mismatch. Have you checked your dep tree?
>
> Asher Krim
> Senior Software Engineer
>
> On Thu, Feb 2, 2017 at 1:28 PM, Benjamin Kim  wrote:
>
>> Elek,
>>
>> Can you give me some sample code? I can’t get mine to work.
>>
>> import org.apache.spark.sql.{SQLContext, _}
>> import org.apache.spark.sql.execution.datasources.hbase._
>> import org.apache.spark.{SparkConf, SparkContext}
>>
>> def cat = s"""{
>> |"table":{"namespace":"ben", "name":"dmp_test",
>> "tableCoder":"PrimitiveType"},
>> |"rowkey":"key",
>> |"columns":{
>> |"col0":{"cf":"rowkey", "col":"key", "type":"string"},
>> |"col1":{"cf":"d", "col":"google_gid", "type":"string"}
>> |}
>> |}""".stripMargin
>>
>> import sqlContext.implicits._
>>
>> def withCatalog(cat: String): DataFrame = {
>> sqlContext
>> .read
>> .options(Map(HBaseTableCatalog.tableCatalog->cat))
>> .format("org.apache.spark.sql.execution.datasources.hbase")
>> .load()
>> }
>>
>> val df = withCatalog(cat)
>> df.show
>>
>>
>> It gives me this error.
>>
>> java.lang.NoSuchMethodError: scala.runtime.ObjectRef.create
>> (Ljava/lang/Object;)Lscala/runtime/ObjectRef;
>> at org.apache.spark.sql.execution.datasources.hbase.HBaseTableC
>> atalog$.apply(HBaseTableCatalog.scala:232)
>> at org.apache.spark.sql.execution.datasources.hbase.HBaseRelati
>> on.(HBaseRelation.scala:77)
>> at org.apache.spark.sql.execution.datasources.hbase.DefaultSour
>> ce.createRelation(HBaseRelation.scala:51)
>> at org.apache.spark.sql.execution.datasources.ResolvedDataSourc
>> e$.apply(ResolvedDataSource.scala:158)
>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
>>
>>
>> If you can please help, I would be grateful.
>>
>> Cheers,
>> Ben
>>
>>
>> On Jan 31, 2017, at 1:02 PM, Marton, Elek  wrote:
>>
>>
>> I tested this one with hbase 1.2.4:
>>
>> https://github.com/hortonworks-spark/shc
>>
>> Marton
>>
>> On 01/31/2017 09:17 PM, Benjamin Kim wrote:
>>
>> Does anyone know how to backport the HBase Spark module to HBase 1.2.0? I
>> tried to build it from source, but I cannot get it to work.
>>
>> Thanks,
>> Ben
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>>
>
>


Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF?

2017-02-03 Thread Alex
Hi,

can You guys tell me if below peice of two codes are returning the same
thing?

(((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get(); from
below two  codes


code 1)

public Object get(Object name) {
  int pos = getPos((String)name);
  if(pos<0) return null;
  String f = "string";
  Object obj= list.get(pos);
  if(obj==null) return null;
  ObjectInspector ins = ((StructField)colnames.get(
pos)).getFieldObjectInspector();
  if(ins!=null) f = ins.getTypeName();
  switch (f) {
case "double" :  return ((DoubleWritable)obj).get();
case "bigint" :  return ((LongWritable)obj).get();
case "string" :  return ((Text)obj).toString();
default  :  return obj;
  }
}


Code 2)

public Object get(Object name) {

int pos = getPos((String) name);

if (pos < 0)

return null;

String f = "string";

String f1 = "string";

Object obj = list.get(pos);



if (obj == null)

return null;

ObjectInspector ins = ((StructField) colnames.get(pos)).
getFieldObjectInspector();

if (ins != null)

f = ins.getTypeName();





PrimitiveObjectInspector ins2 = (PrimitiveObjectInspector)
ins;

f1 = ins2.getPrimitiveCategory().name();





   switch (ins2.getPrimitiveCategory()) {

case DOUBLE:



return (((DoubleObjectInspector) ins2).get(obj));



case LONG:



   return  (((LongObjectInspector) ins2).get(obj));



   case STRING:

return (((StringObjectInspector)
ins2).getPrimitiveJavaObject(obj)).toString();



default:return obj;

}

 }


problem with the method JavaDStream.foreachRDD() SparkStreaming

2017-02-03 Thread Hamza HACHANI
Hi,

I'm new to SparkStreaming.

I'm using the versions 2.10 for spark core and spark streaming

My issue is that when i try to use JavaPairDStream.foreachRDD :

 test.foreachRDD(new Function,Void>() {
public Void call(JavaPairRDD rdd) {
  currentResponseCodeCounts = rdd.take(100);
  return null;
}});


it says this :

The method foreachRDD(Function ,Void>) in the type 
AbstractJavaDStreamLike,JavaPairDStream,JavaPairRDD>
 is not applicable for the arguments (new 
Function,Void>(){})


I can't figure out what i'm missing.

Thanx.



Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Aseem Bansal
Does this support Java 7?

On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal  wrote:

> Is computational time for predictions on the order of few milliseconds (<
> 10 ms) like the old mllib library?
>
> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins  wrote:
>
>> Hey everyone,
>>
>>
>> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits
>> about MLeap and how you can use it to build production services from your
>> Spark-trained ML pipelines. MLeap is an open-source technology that allows
>> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
>> Models to a scoring engine instantly. The MLeap execution engine has no
>> dependencies on a Spark context and the serialization format is entirely
>> based on Protobuf 3 and JSON.
>>
>>
>> The recent 0.5.0 release provides serialization and inference support for
>> close to 100% of Spark transformers (we don’t yet support ALS and LDA).
>>
>>
>> MLeap is open-source, take a look at our Github page:
>>
>> https://github.com/combust/mleap
>>
>>
>> Or join the conversation on Gitter:
>>
>> https://gitter.im/combust/mleap
>>
>>
>> We have a set of documentation to help get you started here:
>>
>> http://mleap-docs.combust.ml/
>>
>>
>> We even have a set of demos, for training ML Pipelines and linear,
>> logistic and random forest models:
>>
>> https://github.com/combust/mleap-demo
>>
>>
>> Check out our latest MLeap-serving Docker image, which allows you to
>> expose a REST interface to your Spark ML pipeline models:
>>
>> http://mleap-docs.combust.ml/mleap-serving/
>>
>>
>> Several companies are using MLeap in production and even more are
>> currently evaluating it. Take a look and tell us what you think! We hope to
>> talk with you soon and welcome feedback/suggestions!
>>
>>
>> Sincerely,
>>
>> Hollin and Mikhail
>>
>
>


Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Aseem Bansal
Is computational time for predictions on the order of few milliseconds (<
10 ms) like the old mllib library?

On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins  wrote:

> Hey everyone,
>
>
> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits about
> MLeap and how you can use it to build production services from your
> Spark-trained ML pipelines. MLeap is an open-source technology that allows
> Data Scientists and Engineers to deploy Spark-trained ML Pipelines and
> Models to a scoring engine instantly. The MLeap execution engine has no
> dependencies on a Spark context and the serialization format is entirely
> based on Protobuf 3 and JSON.
>
>
> The recent 0.5.0 release provides serialization and inference support for
> close to 100% of Spark transformers (we don’t yet support ALS and LDA).
>
>
> MLeap is open-source, take a look at our Github page:
>
> https://github.com/combust/mleap
>
>
> Or join the conversation on Gitter:
>
> https://gitter.im/combust/mleap
>
>
> We have a set of documentation to help get you started here:
>
> http://mleap-docs.combust.ml/
>
>
> We even have a set of demos, for training ML Pipelines and linear,
> logistic and random forest models:
>
> https://github.com/combust/mleap-demo
>
>
> Check out our latest MLeap-serving Docker image, which allows you to
> expose a REST interface to your Spark ML pipeline models:
>
> http://mleap-docs.combust.ml/mleap-serving/
>
>
> Several companies are using MLeap in production and even more are
> currently evaluating it. Take a look and tell us what you think! We hope to
> talk with you soon and welcome feedback/suggestions!
>
>
> Sincerely,
>
> Hollin and Mikhail
>


Bipartite projection with Graphx

2017-02-03 Thread balaji9058
Hi,

Is possible Bipartite projection with Graphx

Rdd1
#id name
1   x1
2   x2
3   x3
4   x4
5   x5
6   x6
7   x7
8   x8

Rdd2
#id name
10001   y1
10002   y2
10003   y3
10004   y4
10005   y5
10006   y6

EdgeList
#src id Dest id
1   10001
1   10002
2   10001
2   10002
2   10004
3   10003
3   10005
4   10001
4   10004
5   10003
5   10005
6   10003
6   10006
7   10005
7   10006
8   10005

  val nodes = Rdd1++ Rdd2
 val Network = Graph(nodes, links)

with above network need to create projection graphs like x1-x2 weight (see
the image in below wiki link)
example:

https://en.wikipedia.org/wiki/Bipartite_network_projection





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Bipartite-projection-with-Graphx-tp28360.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-03 Thread Alex
Hi Team,

Actually I figured out something ..

While Hive Java UDF executed on hive it is giving output with 10 decimal
precision but in spark same udf is giving results rounded off to 6 decimal
precision... How do I stop that? Its the same java udf jar files used in
both hive and spark..

[image: Inline image 1]



On Thu, Feb 2, 2017 at 3:33 PM, Alex  wrote:

> Hi As shown below same query when ran back to back showing inconsistent
> results..
>
> testtable1 is Avro Serde table...
>
> [image: Inline image 1]
>
>
>
>  hc.sql("select * from testtable1 order by col1 limit 1").collect;
> res14: Array[org.apache.spark.sql.Row] = Array([1570,3364,201607,Y,APJ,
> PHILIPPINES,8518944,null,null,null,null,-15.992583,0.0,-15.
> 992583,null,null,MONTH_ITEM_GROUP])
>
> scala> hc.sql("select * from testtable1 order by col1 limit 1").collect;
> res15: Array[org.apache.spark.sql.Row] = Array([1570,485888,20163,N,
> AMERICAS,BRAZIL,null,null,null,null,null,6019.2999,17198.0,6019.
> 2999,null,null,QUARTER_GROUP])
>
> scala> hc.sql("select * from testtable1 order by col1 limit 1").collect;
> res16: Array[org.apache.spark.sql.Row] = Array([1570,3930,201607,Y,APJ,INDIA
> SUB-CONTINENT,8741220,null,null,null,null,-208.485216,0.
> 0,-208.485216,null,null,MONTH_ITEM_GROUP])
>
>


saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Hello All,

This is the content of my RDD which I am saving to Cassandra table.

But looks like the 2nd row is written first and then the first row
overwrites it. So I end up with bad output.

(494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY)
(494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H10M0S, WEDNESDAY)

Is there a way to force the order of the rows written to Cassandra.

Please help.

Thanks