Re: How to configure spark with java

2017-07-23 Thread Patrik Medvedev
What exactly do you need?
Basically you need to add spark libs at your pom.

пн, 24 июл. 2017 г. в 6:22, amit kumar singh :

> Hello everyone
>
> I want to use spark with java API
>
> Please let me know how can I configure it
>
>
> Thanks
> A
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Is there a way to run Spark SQL through REST?

2017-07-23 Thread Sumedh Wale

  
  
Yes, using the new Spark structured streaming you can keep
submitting streaming jobs against the same SparkContext in different
requests (or you can create a new SparkContext if required in a
request). The SparkJob implementation will get handle of the
SparkContext which will be existing one or new one depending on the
REST API calls -- see its github page for details on transient vs
persistent SparkContexts.
With the old Spark streaming model, you cannot add new DStreams once
StreamingContext has started (which has been a limitation of the old
streaming model), so you can submit against the same context but
only until one last job starts the StreamingContext.

regards
sumedh

On Monday 24 July 2017 06:09 AM, kant kodali wrote:

  @Sumedh Can I run streaming jobs on the same
context with spark-jobserver ? so there is no waiting for
results since the spark sql job is expected stream forever and
results of each streaming job are captured through a message
queue.


In my case each spark sql query will be a streaming job.
  
  
On Sat, Jul 22, 2017 at 6:19 AM, Sumedh
  Wale 
  wrote:
  On Saturday 22 July 2017 01:31 PM, kant kodali
  wrote:
  
Is there a way to run Spark SQL through REST?
  
  

There is spark-jobserver (https://github.com/spark-jobserver/spark-jobserver).
It does more than just REST API (like long running
SparkContext).

regards

--
Sumedh Wale
SnappyData (http://www.snappydata.io)

  


  


  


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Question on Spark code

2017-07-23 Thread Reynold Xin
This is a standard practice used for chaining, to support

a.setStepSize(..)
  .set setRegParam(...)


On Sun, Jul 23, 2017 at 8:47 PM, tao zhan  wrote:

> Thank you for replying.
> But I do not get it completely, why does the "this.type“” necessary?
> why could not it be like:
>
> def setStepSize(step: Double): Unit = {
> require(step > 0,
>   s"Initial step size must be positive but got ${step}")
> this.stepSize = step
> }
>
> On Mon, Jul 24, 2017 at 11:29 AM, M. Muvaffak ONUŞ <
> onus.muvaf...@gmail.com> wrote:
>
>> Doesn't it mean the return type will be type of "this" class. So, it
>> doesn't have to be this instance of the class but it has to be type of this
>> instance of the class. When you have a stack of inheritance and call that
>> function, it will return the same type with the level that you called it.
>>
>> On Sun, Jul 23, 2017 at 8:20 PM Reynold Xin  wrote:
>>
>>> It means the same object ("this") is returned.
>>>
>>> On Sun, Jul 23, 2017 at 8:16 PM, tao zhan  wrote:
>>>
 Hello,

 I am new to scala and spark.
 What does the "this.type" in set function for?


 ​
 https://github.com/apache/spark/blob/481f0792944d9a77f0fe8b5
 e2596da1d600b9d0a/mllib/src/main/scala/org/apache/spark/
 mllib/optimization/GradientDescent.scala#L48

 Thanks!

 Zhan

>>>
>>>
>


How to configure spark with java

2017-07-23 Thread amit kumar singh
Hello everyone

I want to use spark with java API

Please let me know how can I configure it


Thanks
A

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Question on Spark code

2017-07-23 Thread Reynold Xin
It means the same object ("this") is returned.

On Sun, Jul 23, 2017 at 8:16 PM, tao zhan  wrote:

> Hello,
>
> I am new to scala and spark.
> What does the "this.type" in set function for?
>
>
> ​
> https://github.com/apache/spark/blob/481f0792944d9a77f0fe8b5e2596da
> 1d600b9d0a/mllib/src/main/scala/org/apache/spark/mllib/
> optimization/GradientDescent.scala#L48
>
> Thanks!
>
> Zhan
>


how to convert the binary from kafak to srring pleaae

2017-07-23 Thread ??????????
Hi all 


I want to change the binary from kafka to string. Would you like help me please?


val df = ss.readStream.format("kafka").option("kafka.bootstrap.server","")
.option("subscribe","")
.load


val value = df.select("value")


value.writeStream
.outputMode("append")
.format("console")
.start()
.awaitTermination()




Above code outputs result like:


++
|value|
+-+
|[61,61]|
+-+




61 is character a receiced from kafka.
I want to print [a,a] or aa.
How should I do please?

Re: Is there a way to run Spark SQL through REST?

2017-07-23 Thread kant kodali
@Sumedh Can I run streaming jobs on the same context with spark-jobserver ?
so there is no waiting for results since the spark sql job is expected
stream forever and results of each streaming job are captured through a
message queue.

In my case each spark sql query will be a streaming job.

On Sat, Jul 22, 2017 at 6:19 AM, Sumedh Wale  wrote:

> On Saturday 22 July 2017 01:31 PM, kant kodali wrote:
>
>> Is there a way to run Spark SQL through REST?
>>
>
> There is spark-jobserver (https://github.com/spark-jobs
> erver/spark-jobserver). It does more than just REST API (like long
> running SparkContext).
>
> regards
>
> --
> Sumedh Wale
> SnappyData (http://www.snappydata.io)
>
>


Re: Get full RDD lineage for a spark job

2017-07-23 Thread Ron Gonzalez
Cool thanks. Will give that a try...
--Ron 

On Friday, July 21, 2017 8:09 PM, Keith Chapman  
wrote:
 

 You could also enable it with --conf spark.logLineage=true if you do not want 
to change any code.

Regards,Keith.
http://keith-chapman.com

On Fri, Jul 21, 2017 at 7:57 PM, Keith Chapman  wrote:

Hi Ron,
You can try using the toDebugString method on the RDD, this will print the RDD 
lineage. 
Regards,Keith.
http://keith-chapman.com

On Fri, Jul 21, 2017 at 11:24 AM, Ron Gonzalez  
wrote:

Hi,  Can someone point me to a test case or share sample code that is able to 
extract the RDD graph from a Spark job anywhere during its lifecycle? I 
understand that Spark has UI that can show the graph of the execution so I'm 
hoping that is using some API somewhere that I could use.  I know RDD is the 
actual execution graph, so if there is also a more logical abstraction API 
closer to calls like map, filter, aggregate, etc., that would even be better.  
Appreciate any help...
Thanks,Ron





   

Re: custom joins on dataframe

2017-07-23 Thread Michael Armbrust
>
> left.join(right, my_fuzzy_udf (left("cola"),right("cola")))
>

While this could work, the problem will be that we'll have to check every
possible combination of tuples from left and right using your UDF.  It
would be best if you could somehow partition the problem so that we could
reduce the number of comparisons.  For example, if you had a fuzzy hash
that you could do an equality check on in addition to the UDF, that would
greatly speed up the computation.


java.lang.NoClassDefFoundError: scala/runtime/AbstractPartialFunction$mcJL$sp

2017-07-23 Thread Kaushal Shriyan
I am facing issue while connecting Apache Spark to Apache Cassandra
Datastore


>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *[root@bin]# ./spark-shell --jars
> ../jars/spark-cassandra-connector-assembly-2.0.3-36-g9a50162.jarUsing
> Spark's default log4j profile:
> org/apache/spark/log4j-defaults.propertiesSetting default log level to
> "WARN".To adjust logging level use sc.setLogLevel(newLevel). For SparkR,
> use setLogLevel(newLevel).17/07/23 23:12:56 WARN NativeCodeLoader: Unable
> to load native-hadoop library for your platform... using builtin-java
> classes where applicable17/07/23 23:13:01 WARN ObjectStore: Failed to get
> database global_temp, returning NoSuchObjectExceptionSpark context Web UI
> available at http://111.23.140.15:4040 Spark
> context available as 'sc' (master = spark://172.16.214.126:7077
> , app id = app-20170723231257-0008).Spark
> session available as 'spark'.Welcome to    __ /
> __/__  ___ _/ /___\ \/ _ \/ _ `/ __/  '_/   /___/ .__/\_,_/_/
> /_/\_\   version 2.2.0  /_/Using Scala version 2.11.8 (Java HotSpot(TM)
> 64-Bit Server VM, Java 1.8.0_131)Type in expressions to have them
> evaluated.Type :help for more information.scala> sc.stopscala> import
> com.datastax.spark.connector._, org.apache.spark.SparkContext,
> org.apache.spark.SparkContext._, org.apache.spark.SparkConfimport
> com.datastax.spark.connector._import org.apache.spark.SparkContextimport
> org.apache.spark.SparkContext._import org.apache.spark.SparkConfscala> val
> conf = new
> SparkConf(true).set("spark.cassandra.connection.host","172.16.214.41")conf:
> org.apache.spark.SparkConf = org.apache.spark.SparkConf@7d0e43d6scala> val
> sc = new SparkContext(conf)sc: org.apache.spark.SparkContext =
> org.apache.spark.SparkContext@202b5293scala> val test_spark_rdd =
> sc.cassandraTable("test_spark", "test")test_spark_rdd:
> com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow]
> = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:16scala>
> test_spark_rdd.first17/07/23 23:15:04 WARN TaskSetManager: Lost task 0.0 in
> stage 0.0 (TID 0, 172.16.214.41, executor 0):
> java.lang.NoClassDefFoundError:
> scala/runtime/AbstractPartialFunction$mcJL$sp at
> java.lang.ClassLoader.defineClass1(Native Method) at
> java.lang.ClassLoader.defineClass(ClassLoader.java:763) at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at
> java.net.URLClassLoader.defineClass(URLClassLoader.java:467) at
> java.net.URLClassLoader.access$100(URLClassLoader.java:73) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:368) at
> java.net.URLClassLoader$1.run(URLClassLoader.java:362) at
> java.security.AccessController.doPrivileged(Native Method) at
> java.net.URLClassLoader.findClass(URLClassLoader.java:361) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357) at
> com.datastax.spark.connector.rdd.CassandraLimit$.limitForIterator(CassandraLimit.scala:21)
> at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:368)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at
> org.apache.spark.scheduler.Task.run(Task.scala:108) at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)Caused by:
> java.lang.ClassNotFoundException:
> scala.runtime.AbstractPartialFunction$mcJL$sp at
> java.net.URLClassLoader.findClass(URLClassLoader.java:381) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:424) at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at
> java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 22 more17/07/23
> 23:15:04 WARN TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1,
> 172.16.214.41, executor 0): java.lang.NoClassDefFoundError:
> com/datastax/spark/connector/rdd/CassandraLimit$$anonfun$limitForIterator$1
> at
> com.datastax.spark.connector.rdd.CassandraLimit$.limitForIterator(CassandraLimit.scala:21)
> at
> com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:368)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at
> org.apache.spark.s

unsubscribe

2017-07-23 Thread Vasilis Hadjipanos
Please unsubscribe me