is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
Hi All,

Is there any way to create a new timeuuid column of a existing dataframe
using raw sql? you can assume that there is a timeuuid udf function if that
helps.

Thanks!


Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Jean Georges Perrin
Sure, use withColumn()...

jg


> On Feb 1, 2018, at 05:50, kant kodali  wrote:
> 
> Hi All,
> 
> Is there any way to create a new timeuuid column of a existing dataframe 
> using raw sql? you can assume that there is a timeuuid udf function if that 
> helps.
> 
> Thanks!


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [Structured Streaming] Reuse computation result

2018-02-01 Thread Sandip Mehta
You can use persist() or cache() operation on DataFrame.

On Tue, Dec 26, 2017 at 4:02 PM Shu Li Zheng  wrote:

> Hi all,
>
> I have a scenario like this:
>
> val df = dataframe.map().filter()
> // agg 1
> val query1 = df.sum.writeStream.start
> // agg 2
> val query2 = df.count.writeStream.start
>
> With spark streaming, we can apply persist() on rdd to reuse the df
> computation result, when we call persist() after filter() map().filter()
> operator only run once.
> With SS, we can’t apply persist() direct on dataframe. query1 and query2
> will not reuse result after filter. map/filter run twice. So is there a way
> to solve this.
>
> Regards,
>
> Shu li Zheng
>
>


Re: Apache Spark - Exception on adding column to Structured Streaming DataFrame

2018-02-01 Thread M Singh
Hi TD:
Here is the udpated code with explain and full stack trace.
Please let me know what could be the issue and what to look for in the explain 
output.
Updated code:
import scala.collection.immutableimport org.apache.spark.sql.functions._import 
org.joda.time._import org.apache.spark.sql._import 
org.apache.spark.sql.types._import org.apache.spark.sql.streaming._import 
org.apache.log4j._
object StreamingTest {  def main(args:Array[String]) : Unit = {    val 
sparkBuilder = SparkSession      .builder.      
config("spark.sql.streaming.checkpointLocation", "./checkpointes").      
appName("StreamingTest").master("local[4]")          val spark = 
sparkBuilder.getOrCreate()       val schema = StructType(        Array(         
 StructField("id", StringType, false),           StructField("visit", 
StringType, false)          ))    var dataframeInput = 
spark.readStream.option("sep","\t").schema(schema).csv("./data/")    var 
dataframe2 = dataframeInput.select("*")    dataframe2 = 
dataframe2.withColumn("cts", current_timestamp().cast("long"))    
dataframe2.explain(true)    val query = 
dataframe2.writeStream.option("trucate","false").format("console").start    
query.awaitTermination()  }}
Explain output:
== Parsed Logical Plan ==Project [id#0, visit#1, cast(current_timestamp() as 
bigint) AS cts#6L]+- AnalysisBarrier      +- Project [id#0, visit#1]         +- 
StreamingRelation 
DataSource(org.apache.spark.sql.SparkSession@3c74aa0d,csv,List(),Some(StructType(StructField(id,StringType,false),
 StructField(visit,StringType,false))),List(),None,Map(sep ->  , path -> 
./data/),None), FileSource[./data/], [id#0, visit#1]
== Analyzed Logical Plan ==id: string, visit: string, cts: bigintProject [id#0, 
visit#1, cast(current_timestamp() as bigint) AS cts#6L]+- Project [id#0, 
visit#1]   +- StreamingRelation 
DataSource(org.apache.spark.sql.SparkSession@3c74aa0d,csv,List(),Some(StructType(StructField(id,StringType,false),
 StructField(visit,StringType,false))),List(),None,Map(sep ->  , path -> 
./data/),None), FileSource[./data/], [id#0, visit#1]
== Optimized Logical Plan ==Project [id#0, visit#1, 1517499591 AS cts#6L]+- 
StreamingRelation 
DataSource(org.apache.spark.sql.SparkSession@3c74aa0d,csv,List(),Some(StructType(StructField(id,StringType,false),
 StructField(visit,StringType,false))),List(),None,Map(sep ->  , path -> 
./data/),None), FileSource[./data/], [id#0, visit#1]
== Physical Plan ==*(1) Project [id#0, visit#1, 1517499591 AS cts#6L]+- 
StreamingRelation FileSource[./data/], [id#0, visit#1]
Here is the exception:
18/02/01 07:39:52 ERROR MicroBatchExecution: Query [id = 
a0e573f0-e93b-48d9-989c-1aaa73539b58, runId = 
b5c618cb-30c7-4eff-8f09-ea1d064878ae] terminated with 
errororg.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call 
to dataType on unresolved object, tree: 'cts at 
org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.dataType(unresolved.scala:105)
 at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
 at 
org.apache.spark.sql.types.StructType$$anonfun$fromAttributes$1.apply(StructType.scala:435)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) 
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.spark.sql.types.StructType$.fromAttributes(StructType.scala:435) at 
org.apache.spark.sql.catalyst.plans.QueryPlan.schema$lzycompute(QueryPlan.scala:157)
 at org.apache.spark.sql.catalyst.plans.QueryPlan.schema(QueryPlan.scala:157) 
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:448)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:134)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:122)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:122)
 at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:271)
 at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
 at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:122)
 at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
 at 
org.apache.spark.sql.execution.strea

unsubscribe

2018-02-01 Thread James Casiraghi
unsubscribe


Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread kant kodali
Hi,

Are you talking about df.withColumn() ? If so, thats not what I meant. I
meant creating a new column using raw sql. otherwords say I dont have a
dataframe I only have the view name from df.createOrReplaceView("table") so
I can do things like "select * from table" so in a similar fashion I want
to see how I can create a new Column using the raw sql. I am looking at
this reference https://docs.databricks.com/spark/latest/spark-sql/index.html
and I am not seeing a way.

Thanks!

On Thu, Feb 1, 2018 at 4:01 AM, Jean Georges Perrin  wrote:

> Sure, use withColumn()...
>
> jg
>
>
> > On Feb 1, 2018, at 05:50, kant kodali  wrote:
> >
> > Hi All,
> >
> > Is there any way to create a new timeuuid column of a existing dataframe
> using raw sql? you can assume that there is a timeuuid udf function if that
> helps.
> >
> > Thanks!
>
>


spark 2.2.1

2018-02-01 Thread Mihai Iacob
I am setting up a spark 2.2.1 cluster, however, when I bring up the master and workers (both on spark 2.2.1) I get this error. I tried spark 2.2.0 and get the same error. It works fine on spark 2.0.2. Have you seen this before, any idea what's wrong?
 
I found this, but it's in a different situation: https://github.com/apache/spark/pull/19802
 
18/02/01 05:07:22 ERROR Utils: Exception encountered
java.io.InvalidClassException: org.apache.spark.rpc.RpcEndpointRef; local class incompatible: stream classdesc serialVersionUID = -1223633663228316618, local class serialVersionUID = 1835832137613908542
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:687)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
        at java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:563)
        at org.apache.spark.deploy.master.WorkerInfo$$anonfun$readObject$1.apply$mcV$sp(WorkerInfo.scala:52)
        at org.apache.spark.deploy.master.WorkerInfo$$anonfun$readObject$1.apply(WorkerInfo.scala:51)
        at org.apache.spark.deploy.master.WorkerInfo$$anonfun$readObject$1.apply(WorkerInfo.scala:51)
        at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1303)
        at org.apache.spark.deploy.master.WorkerInfo.readObject(WorkerInfo.scala:51)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1158)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:433)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.deploy.master.FileSystemPersistenceEngine.org$apache$spark$deploy$master$FileSystemPersistenceEngine$$deserializeFromFile(FileSystemPersistenceEngine.scala:80)
        at org.apache.spark.deploy.master.FileSystemPersistenceEngine$$anonfun$read$1.apply(FileSystemPersistenceEngine.scala:56)
        at org.apache.spark.deploy.master.FileSystemPersistenceEngine$$anonfun$read$1.apply(FileSystemPersistenceEngine.scala:56)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
        at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
        at org.apache.spark.deploy.master.FileSystemPersistenceEngine.read(FileSystemPersistenceEngine.scala:56)
        at org.apache.spark.deploy.master.PersistenceEngine$$anonfun$readPersistedData$1.apply(PersistenceEngine.scala:87)
        at org.apache.spark.deploy.master.PersistenceEngine$$anonfun$readPersistedData$1.apply(PersistenceEngine.scala:86)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:316)
       packet_write_wait: Connection to 9.30.118.193 port 22: Broken pipeData(PersistenceEngine.scala:86)​​​
 
 
Regards, 

Mihai IacobDSX Local - Security, IBM Analytics


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Subhash Sriram
If you have the temp view name (table, for example), couldn't you do
something like this?

val dfWithColumn=spark.sql("select *,  as new_column from
table")

Thanks,
Subhash

On Thu, Feb 1, 2018 at 11:18 AM, kant kodali  wrote:

> Hi,
>
> Are you talking about df.withColumn() ? If so, thats not what I meant. I
> meant creating a new column using raw sql. otherwords say I dont have a
> dataframe I only have the view name from df.createOrReplaceView("table")
> so I can do things like "select * from table" so in a similar fashion I
> want to see how I can create a new Column using the raw sql. I am looking
> at this reference https://docs.databricks.com/spark/latest/
> spark-sql/index.html and I am not seeing a way.
>
> Thanks!
>
> On Thu, Feb 1, 2018 at 4:01 AM, Jean Georges Perrin  wrote:
>
>> Sure, use withColumn()...
>>
>> jg
>>
>>
>> > On Feb 1, 2018, at 05:50, kant kodali  wrote:
>> >
>> > Hi All,
>> >
>> > Is there any way to create a new timeuuid column of a existing
>> dataframe using raw sql? you can assume that there is a timeuuid udf
>> function if that helps.
>> >
>> > Thanks!
>>
>>
>


Re: is there a way to create new column with timeuuid using raw spark sql ?

2018-02-01 Thread Liana Napalkova
unsubscribe


DISCLAIMER: Aquest missatge pot contenir informació confidencial. Si vostè no 
n'és el destinatari, si us plau, esborri'l i faci'ns-ho saber immediatament a 
la següent adreça: le...@eurecat.org Si el destinatari d'aquest missatge no 
consent la utilització del correu electrònic via Internet i la gravació de 
missatges, li preguem que ens ho comuniqui immediatament.

DISCLAIMER: Este mensaje puede contener información confidencial. Si usted no 
es el destinatario del mensaje, por favor bórrelo y notifíquenoslo 
inmediatamente a la siguiente dirección: le...@eurecat.org Si el destinatario 
de este mensaje no consintiera la utilización del correo electrónico vía 
Internet y la grabación de los mensajes, rogamos lo ponga en nuestro 
conocimiento de forma inmediata.

DISCLAIMER: Privileged/Confidential Information may be contained in this 
message. If you are not the addressee indicated in this message you should 
destroy this message, and notify us immediately to the following address: 
le...@eurecat.org. If the addressee of this message does not consent to the use 
of Internet e-mail and message recording, please notify us immediately.





Spark JDBC bulk insert

2018-02-01 Thread Subhash Sriram
Hey everyone,

I have a use case where I will be processing data in Spark and then writing
it back to MS SQL Server.

Is it possible to use bulk insert functionality and/or batch the writes
back to SQL?

I am using the DataFrame API to write the rows:

sqlContext.write.jdbc(...)

Thanks in advance for any ideas or assistance!
Subhash


does Kinesis Connector for structured streaming auto-scales receivers if a cluster is using dynamic allocation and auto-scaling?

2018-02-01 Thread Mikhailau, Alex
does Kinesis Connector for structured streaming auto-scales receivers if a 
cluster is using dynamic allocation and auto-scaling?


Structure streaming to hive with kafka 0.9

2018-02-01 Thread KhajaAsmath Mohammed
Hi,

Could anyone please share example of on how to use spark structured streaming 
with kafka and write data into hive. Versions that I do have are

Spark 2.1 on CDH5.10
Kafka 0.9

Thanks,
Asmath 

Sent from my iPhone
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org