Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Aseem Bansal
@Debasish I see that the spark version being used in the project that you mentioned is 1.6.0. I would suggest that you take a look at some blogs related to Spark 2.0 Pipelines, Models in new ml package. The new ml package's API as of latest Spark 2.1.0 release has no way to call predict on single

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Chris Fregly
to date, i haven't seen very good performance coming from mleap. i believe ram from databricks keeps getting you guys on stage at the spark summits, but i've been unimpressed with the performance numbers - as well as your choice to reimplement own non-standard "pmml-like" mechanism which incurs

Turning rows into columns

2017-02-04 Thread Paul Tremblay
I am using pyspark 2.1 and am wondering how to convert a flat file, with one record per row, into a columnar format. Here is an example of the data: u'WARC/1.0', u'WARC-Type: warcinfo', u'WARC-Date: 2016-12-08T13:00:23Z', u'WARC-Record-ID: ', u'Content-Length: 344', u'Content-Type:

Re: SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$

2017-02-04 Thread Marco Mistroni
Hi not sure if this will help at all, and pls take it with a pinch of salt as i dont have your setup and i am not running on a cluster I have tried to run a kafka example which was originally workkign on spark 1.6.1 on spark 2. These are the jars i am using

Mismatched datatype in Case statement

2017-02-04 Thread Aviral Agarwal
Hi, I was trying Spark version 1.6.0 when I ran into the error mentioned in the following Hive JIRA. https://issues.apache.org/jira/browse/HIVE-5825 This error was there in both cases : either using SQLContext or HiveContext. Any indication if this has been fixed in a higher spark version ? If

SSpark streaming: Could not initialize class kafka.consumer.FetchRequestAndResponseStatsRegistry$

2017-02-04 Thread Mich Talebzadeh
I am getting this error with Spark 2. which works with CDH 5.5.1 (Spark 1.5). Admittedly I am messing around with Spark-shell. However, I am surprised why this does not work with Spark 2 and is ok with CDH 5.1 scala> val dstream = KafkaUtils.createDirectStream[String, String, StringDecoder,

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
Except of course lda als and neural net modelfor them the model need to be either prescored and cached on a kv store or the matrices / graph should be kept on kv store to access them using a REST API to serve the output..for neural net its more fun since its a distributed or local graph over

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
If we expose an API to access the raw models out of PipelineModel can't we call predict directly on it from an API ? Is there a task open to expose the model out of PipelineModel so that predict can be called on itthere is no dependency of spark context in ml model... On Feb 4, 2017 9:11 AM,

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Aseem Bansal
- In Spark 2.0 there is a class called PipelineModel. I know that the title says pipeline but it is actually talking about PipelineModel trained via using a Pipeline. - Why PipelineModel instead of pipeline? Because usually there is a series of stuff that needs to be done when doing

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
I am not sure why I will use pipeline to do scoring...idea is to build a model, use model ser/deser feature to put it in the row or column store of choice and provide a api access to the model...we support these primitives in github.com/Verizon/trapezium...the api has access to spark context in

Re: How to checkpoint and RDD after a stage and before reaching an action?

2017-02-04 Thread Koert Kuipers
this is a general problem with checkpoint, one of the least understood operations i think. checkpoint is lazy (meaning it doesnt start until there is an action) and asynchronous (meaning when it does start it is its own computation). so basically with a checkpoint the rdd always gets computed

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Aseem Bansal
Does this support Java 7? What is your timezone in case someone wanted to talk? On Fri, Feb 3, 2017 at 10:23 PM, Hollin Wilkins wrote: > Hey Aseem, > > We have built pipelines that execute several string indexers, one hot > encoders, scaling, and a random forest or linear

Re: specifing schema on dataframe

2017-02-04 Thread Sam Elamin
Hi Direceu Thanks your right! that did work But now im facing an even bigger problem since i dont have access to change the underlying data, I just want to apply a schema over something that was written via the sparkContext.newAPIHadoopRDD Basically I am reading in a RDD[JsonObject] and would

Re: specifing schema on dataframe

2017-02-04 Thread Dirceu Semighini Filho
Hi Sam Remove the " from the number that it will work Em 4 de fev de 2017 11:46 AM, "Sam Elamin" escreveu: > Hi All > > I would like to specify a schema when reading from a json but when trying > to map a number to a Double it fails, I tried FloatType and IntType with

specifing schema on dataframe

2017-02-04 Thread Sam Elamin
Hi All I would like to specify a schema when reading from a json but when trying to map a number to a Double it fails, I tried FloatType and IntType with no joy! When inferring the schema customer id is set to String, and I would like to cast it as Double so df1 is corrupted while df2 shows

Re: NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job

2017-02-04 Thread Jacek Laskowski
Hi, I'd say the error says it all : Caused by: NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{XX.XXX.XXX.XX}{XX.XXX.XXX.XX:9300}]] Jacek On 3 Feb 2017 7:58 p.m., "Anastasios Zouzias" wrote: Hi there, Are you sure that the cluster

NullPointerException while joining two avro Hive tables

2017-02-04 Thread Понькин Алексей
Hi, I have a table in Hive(data is stored as avro files). Using python spark shell I am trying to join two datasets events = spark.sql('select * from mydb.events') intersect = events.where('attr2 in (5,6,7) and attr1 in (1,2,3)') intersect.count() But I am constantly receiving the following

Re: java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero()Lscala/runtime/ObjectRef

2017-02-04 Thread Sam Elamin
Hi sathyanarayanan zero() on scala.runtime.VolatileObjectRef has been introduced in Scala 2.11 You probably have a library compiled against Scala 2.11 and running on a Scala 2.10 runtime. See v2.10: https://github.com/scala/scala/blob/2.10.x/src/library/scala/runtime/VolatileObjectRef.java

java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero()Lscala/runtime/ObjectRef

2017-02-04 Thread sathyanarayanan mudhaliyar
Hi , I got the error below when executed Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero()Lscala/runtime/ObjectRef; error in detail: Exception in thread "main" java.lang.NoSuchMethodError: scala.runtime.ObjectRef.zero()Lscala/runtime/ObjectRef; at

Re: Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF?

2017-02-04 Thread Alex
H, Please Reply? On Fri, Feb 3, 2017 at 8:19 PM, Alex wrote: > Hi, > > can You guys tell me if below peice of two codes are returning the same > thing? > > (((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get() > ; from below two codes > > > code 1)

Re: spark architecture question -- Pleas Read

2017-02-04 Thread Mich Talebzadeh
Ingesting from Hive tables back into Oracle. What mechanisms are in place to ensure that data ends up consistently into Oracle table and Spark is notified when Oracle has issues with data ingested (say rollback)? Dr Mich Talebzadeh LinkedIn *