Re: Apache Spark - Spark Structured Streaming - Watermark usage

2018-01-26 Thread Jacek Laskowski
Hi,

I'm curious how would you do the requirement "by a certain amount of time"
without a watermark? How would you know what's current and compute the lag?
Let's forget about watermark for a moment and see if it pops up as an
inevitable feature :)

"I am trying to filter out records which are lagging behind (based on event
time) by a certain amount of time."

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Fri, Jan 26, 2018 at 7:14 PM, M Singh 
wrote:

> Hi:
>
> I am trying to filter out records which are lagging behind (based on event
> time) by a certain amount of time.
>
> Is the watermark api applicable to this scenario (ie, filtering lagging
> records) or it is only applicable with aggregation ?  I could not get a
> clear understanding from the documentation which only refers to it's usage
> with aggregation.
>
> Thanks
>
> Mans
>


Apache Spark - Spark Structured Streaming - Watermark usage

2018-01-26 Thread M Singh
Hi:
I am trying to filter out records which are lagging behind (based on event 
time) by a certain amount of time.  
Is the watermark api applicable to this scenario (ie, filtering lagging 
records) or it is only applicable with aggregation ?  I could not get a clear 
understanding from the documentation which only refers to it's usage with 
aggregation.

Thanks
Mans

Re: Apache Spark - Custom structured streaming data source

2018-01-26 Thread M Singh
Thanks TD.  When will 2.3 scheduled for release ?   

On Thursday, January 25, 2018 11:32 PM, Tathagata Das  
wrote:
 

 Hello Mans,
The streaming DataSource APIs are still evolving and are not public yet. Hence 
there is no official documentation. In fact, there is a new DataSourceV2 API 
(in Spark 2.3) that we are migrating towards. So at this point of time, it's 
hard to make any concrete suggestion. You can take a look at the classes 
DataSourceV2, DataReader, MicroBatchDataReader in the spark source code, along 
with their implementations.
Hope this helps. 
TD

On Jan 25, 2018 8:36 PM, "M Singh"  wrote:

Hi:
I am trying to create a custom structured streaming source and would like to 
know if there is any example or documentation on the steps involved.
I've looked at the some methods available in the SparkSession but these are 
internal to the sql package:
  private[sql] def internalCreateDataFrame(      catalystRows: 
RDD[InternalRow],      schema: StructType,      isStreaming: Boolean = false): 
DataFrame = {    // TODO: use MutableProjection when rowRDD is another 
DataFrame and the applied    // schema differs from the existing schema on any 
field data type.    val logicalPlan = LogicalRDD(      schema.toAttributes,     
 catalystRows,      isStreaming = isStreaming)(self)    Dataset.ofRows(self, 
logicalPlan)  } 
Please let me know where I can find the appropriate API or documentation.
Thanks
Mans



   

Re: Spark Standalone Mode, application runs, but executor is killed

2018-01-26 Thread Chandu
/Reply from Marco posted in another thread/

Re: Best active groups, forums or contacts for Spark ?
Posted by Marco Mistroni on Jan 26, 2018; 9:08am 
URL:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-active-groups-forums-or-contacts-for-Spark-tp30744p30748.html

Hi
 From personal experienceand I might be asking u obvious question
1. Does it work in standalone (no cluster)
2. Can u break down app in pieces and try to see at which step the code gets
killed?
3. Have u had a look at spark gui to see if we executors go oom?

I might be oversimplifying what spark does. But if ur logic works standalone
and does not work in clusterthe cluster might b ur problem..(apart from
modules not being serializable)
If it breaks in no cluster mode then it's easier to debug
I am no way an expert, just talking from my little personal experience.
I m sure someone here can give more hints on how to debug properly a spark
app
Hth



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Best active groups, forums or contacts for Spark ?

2018-01-26 Thread Chandu
@Thanks Marco.
I have provided information in my original post (
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standalone-Mode-application-runs-but-executor-is-killed-tc30739.html

 
) so as to keep the context of the ask for anyone looking for similar
information in future.




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Standalone Mode, application runs, but executor is killed

2018-01-26 Thread Chandu
@Marco
Thank you.
I thought Standalone and Standalone cluster are the same?
The app is not a huge app.
It's just PI calculation example.
The value of PI is calculated passed to the driver successfully.
When I issue the spark.stop from my driver, that is when I see the KILLED
message on the worker .
I checked the spark GUI for master and worker and did not see any
errors/exceptions reported.

/18/01/26 08:30:32,058 INFO Worker: Asked to kill executor
app-20180126082722-0001/0
18/01/26 08:30:32,064 INFO ExecutorRunner: Runner thread for executor
app-20180126082722-0001/0 interrupted
18/01/26 08:30:32,065 INFO ExecutorRunner: Killing process!/




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Standalone Mode, application runs, but executor is killed

2018-01-26 Thread Chandu
/Reply from Marco in another post/

Re: Best active groups, forums or contacts for Spark ?
Posted by Marco Mistroni on Jan 26, 2018; 9:08am 
URL:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-active-groups-forums-or-contacts-for-Spark-tp30744p30748.html

Hi
 From personal experienceand I might be asking u obvious question
1. Does it work in standalone (no cluster)
2. Can u break down app in pieces and try to see at which step the code gets
killed?
3. Have u had a look at spark gui to see if we executors go oom?

I might be oversimplifying what spark does. But if ur logic works standalone
and does not work in clusterthe cluster might b ur problem..(apart from
modules not being serializable)
If it breaks in no cluster mode then it's easier to debug
I am no way an expert, just talking from my little personal experience.
I m sure someone here can give more hints on how to debug properly a spark
app
Hth



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark Standalone Mode, application runs, but executor is killed

2018-01-26 Thread Chandu
/Reply from Marco posted in another thread/

Re: Best active groups, forums or contacts for Spark ?
Posted by Marco Mistroni on Jan 26, 2018; 9:08am 
URL:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-active-groups-forums-or-contacts-for-Spark-tp30744p30748.html

Hi
 From personal experienceand I might be asking u obvious question
1. Does it work in standalone (no cluster)
2. Can u break down app in pieces and try to see at which step the code gets
killed?
3. Have u had a look at spark gui to see if we executors go oom?

I might be oversimplifying what spark does. But if ur logic works standalone
and does not work in clusterthe cluster might b ur problem..(apart from
modules not being serializable)
If it breaks in no cluster mode then it's easier to debug
I am no way an expert, just talking from my little personal experience.
I m sure someone here can give more hints on how to debug properly a spark
app
Hth



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Best active groups, forums or contacts for Spark ?

2018-01-26 Thread Marco Mistroni
Hi
 From personal experienceand I might be asking u obvious question
1. Does it work in standalone (no cluster)
2. Can u break down app in pieces and try to see at which step the code
gets killed?
3. Have u had a look at spark gui to see if we executors go oom?

I might be oversimplifying what spark does. But if ur logic works
standalone and does not work in clusterthe cluster might b ur
problem..(apart from modules not being serializable)
If it breaks in no cluster mode then it's easier to debug
I am no way an expert, just talking from my little personal experience.
I m sure someone here can give more hints on how to debug properly a spark
app
Hth


On Jan 26, 2018 1:18 PM, "Chandu"  wrote:

> @Esa Thanks for posting this as I was thinking the same way when trying to
> get some help about Spark (I am just a beginner)
>
> @Jack
> I posted a question @ here (
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-
> Standalone-Mode-application-runs-but-executor-is-killed-tc30739.html
>  Spark-Standalone-Mode-application-runs-but-executor-is-killed-tc30739.html
> >
> ) and stackoverflow (
> https://stackoverflow.com/questions/48445145/spark-
> standalone-mode-application-runs-but-executor-is-killed-with-exitstatus
>  standalone-mode-application-runs-but-executor-is-killed-with-exitstatus>
> ) and haven't received much of views or even a comment.
>
> I am new to Spark and may be my question is framed badly.
> Would you be able to take a look?
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-26 Thread Rick Moritz
Hi,

We solved this the ugly way, when parsing external column definitions:

private def columnTypeToFieldType(columnType: String): DataType = {
  columnType match {
case "IntegerType" => IntegerType
case "StringType" => StringType
case "DateType" => DateType
case "FloatType" => FloatType
case "DecimalType" => DecimalType.SYSTEM_DEFAULT
case "TimeStampType" => TimestampType
case "BooleanType" => BooleanType
case _ => throw new IllegalArgumentException(s"ColumnType
$columnType is not known " +
  s"please add it in the ${this.getClass.getName} class!")
  }
}

There may be a prettier solution than this, but especially with
DecimalType, there are limitations where even with Reflection and
Class.forName, it's not trivial (i.e.
Class.forName(s"org.apache.spark.sql.types.$columnType"))
Furthermore, getting a companion object for a class name is a bit uglier
than getting just the class, see
https://stackoverflow.com/questions/11020746/get-companion-object-instance-with-new-scala-reflection-api
Since the number of types can be expected to be roughly constant, you only
have to overhead of Scala's matching engine, in the ugly solution. In our
case, the effort of engineering something was outshone by a simple method
that might rarely fail, but then does so in a mostly understandable way.

N.B.: mapping isn't complete -- complex types weren't in our scope.

On Fri, Jan 26, 2018 at 8:11 AM, Kurt Fehlhauer  wrote:

> Can you share your code and a sample of your data? WIthout seeing it, I
> can't give a definitive answer. I can offer some hints. If you have a
> column of strings you should either be able to create a new column casted
> to Integer. This can be accomplished two ways:
>
> df.withColumn("newColumn", df.currentColumn.cast(IntegerType))
>
> or
>
> val df = df.select("cast(CurretColumn as int) newColum")
>
>
> Without seeing your json, I really can't offer assistance.
>
>
> On Thu, Jan 25, 2018 at 11:39 PM, kant kodali  wrote:
>
>> It seems like its hard to construct a DataType given its String literal
>> representation.
>>
>> dataframe.types() return column names and its corresponding Types. for
>> example say I have an integer column named "sum" doing dataframe.dtypes()
>> would return "sum" and "IntegerType" but this string  representation
>> "IntegerType" doesnt seem to be very useful because I cannot do
>> DataType.fromJson("IntegerType") This will throw an error. so I am not
>> quite sure how to construct a DataType given its String representation ?
>>
>> On Thu, Jan 25, 2018 at 4:22 PM, kant kodali  wrote:
>>
>>> Hi All,
>>>
>>> I have a datatype "IntegerType" represented as a String and now I want
>>> to create DataType object out of that. I couldn't find in the DataType or
>>> DataTypes api on how to do that?
>>>
>>> Thanks!
>>>
>>
>>
>


Re: Best active groups, forums or contacts for Spark ?

2018-01-26 Thread Chandu
@Esa Thanks for posting this as I was thinking the same way when trying to
get some help about Spark (I am just a beginner)

@Jack
I posted a question @ here (
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Standalone-Mode-application-runs-but-executor-is-killed-tc30739.html

 
) and stackoverflow (
https://stackoverflow.com/questions/48445145/spark-standalone-mode-application-runs-but-executor-is-killed-with-exitstatus

 
) and haven't received much of views or even a comment.

I am new to Spark and may be my question is framed badly. 
Would you be able to take a look?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Best active groups, forums or contacts for Spark ?

2018-01-26 Thread Jacek Laskowski
Hi Esa,

I'd say https://stackoverflow.com/questions/tagged/apache-spark is where
many active sparkians hang out :)

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski

On Fri, Jan 26, 2018 at 12:15 PM, Esa Heikkinen <
esa.heikki...@student.tut.fi> wrote:

> Hi
>
>
>
> It is very often difficult to get answers of question about Spark in many
> forums.. Maybe they are inactive or my questions are too bad. I don’t know,
> but does anyone know good active groups, forums or contacts other like this
> ?
>
>
>
> Esa Heikkinen
>
>
>


Best active groups, forums or contacts for Spark ?

2018-01-26 Thread Esa Heikkinen
Hi

It is very often difficult to get answers of question about Spark in many 
forums.. Maybe they are inactive or my questions are too bad. I don't know, but 
does anyone know good active groups, forums or contacts other like this ?

Esa Heikkinen