Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-02 Thread Reynold Xin
SQL is very common and even some business analysts learn them. Scala and
Python are great, but the easiest language to use is often the languages a
user already knows. And for a lot of users, that is SQL.

On Wednesday, March 2, 2016, Jerry Lam  wrote:

> Hi guys,
>
> FYI... this wiki page (StreamSQL: https://en.wikipedia.org/wiki/StreamSQL)
> has some histories related Event Stream Processing and SQL.
>
> Hi Steve,
>
> It is difficult to ask your customers that they should learn a new
> language when they are not programmers :)
> I don't know where/why they learn SQL-like languages. Do business schools
> teach SQL??
>
> Best Regards,
>
> Jerry
>
>
> On Wed, Mar 2, 2016 at 10:03 AM, Steve Loughran  > wrote:
>
>>
>> > On 1 Mar 2016, at 22:25, Jerry Lam > > wrote:
>> >
>> > Hi Reynold,
>> >
>> > You are right. It is about the audience. For instance, in many of my
>> cases, the SQL style is very attractive if not mandatory for people with
>> minimum programming knowledge.
>>
>> but SQL skills instead. Which is just relational set theory with a
>> syntax, Structured English Query Language from the IBM R project of the mid
>> 1970s (\cite{Gray et al, An evaluation of System R})
>>
>> If you look at why SQL snuck back in as a layer atop the "Post-SQL
>> systems", it's
>>
>> (a) tooling
>> (b) declarative queries can be optimised by query planners
>> (c) a lot of people who do queries on existing systems can migrate to the
>> new platforms. This is why FB wrote Hive; their PHP GUI teams didn't want
>> to learn Java.
>>
>>
>> > SQL has its place for communication. Last time I show someone spark
>> dataframe-style, they immediately said it is too difficult to use. When I
>> change it to SQL, they are suddenly happy and say how you do this. It
>> sounds stupid but that's what it is for now.
>> >
>>
>> try showing the python syntax. Python is an easier language to learn, and
>> its list comprehensions are suspiciously close to applied set theory.
>>
>>
>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> 
>> For additional commands, e-mail: dev-h...@spark.apache.org
>> 
>>
>>
>


Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-02 Thread Jerry Lam
Hi guys,

FYI... this wiki page (StreamSQL: https://en.wikipedia.org/wiki/StreamSQL)
has some histories related Event Stream Processing and SQL.

Hi Steve,

It is difficult to ask your customers that they should learn a new language
when they are not programmers :)
I don't know where/why they learn SQL-like languages. Do business schools
teach SQL??

Best Regards,

Jerry


On Wed, Mar 2, 2016 at 10:03 AM, Steve Loughran 
wrote:

>
> > On 1 Mar 2016, at 22:25, Jerry Lam  wrote:
> >
> > Hi Reynold,
> >
> > You are right. It is about the audience. For instance, in many of my
> cases, the SQL style is very attractive if not mandatory for people with
> minimum programming knowledge.
>
> but SQL skills instead. Which is just relational set theory with a syntax,
> Structured English Query Language from the IBM R project of the mid 1970s
> (\cite{Gray et al, An evaluation of System R})
>
> If you look at why SQL snuck back in as a layer atop the "Post-SQL
> systems", it's
>
> (a) tooling
> (b) declarative queries can be optimised by query planners
> (c) a lot of people who do queries on existing systems can migrate to the
> new platforms. This is why FB wrote Hive; their PHP GUI teams didn't want
> to learn Java.
>
>
> > SQL has its place for communication. Last time I show someone spark
> dataframe-style, they immediately said it is too difficult to use. When I
> change it to SQL, they are suddenly happy and say how you do this. It
> sounds stupid but that's what it is for now.
> >
>
> try showing the python syntax. Python is an easier language to learn, and
> its list comprehensions are suspiciously close to applied set theory.
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
Hi Reynold,

You are right. It is about the audience. For instance, in many of my cases,
the SQL style is very attractive if not mandatory for people with minimum
programming knowledge. SQL has its place for communication. Last time I
show someone spark dataframe-style, they immediately said it is too
difficult to use. When I change it to SQL, they are suddenly happy and say
how you do this. It sounds stupid but that's what it is for now.

The following example will make some banks happy (copy from the Oracle
solution):

SELECT *
FROM Ticker MATCH_RECOGNIZE (
 PARTITION BY symbol
 ORDER BY tstamp
 MEASURES  STRT.tstamp AS start_tstamp,
   LAST(DOWN.tstamp) AS bottom_tstamp,
   LAST(UP.tstamp) AS end_tstamp
 ONE ROW PER MATCH
 AFTER MATCH SKIP TO LAST UP
 PATTERN (STRT DOWN+ UP+)
 DEFINE
DOWN AS DOWN.price < PREV(DOWN.price),
UP AS UP.price > PREV(UP.price)
 ) MR
ORDER BY MR.symbol, MR.start_tstamp;

Basically this query finds all cases where stock prices dipped to a bottom
price and then rose (the popular V-shape).  It might be confusing at first
but it is still readable for many users who know SQL. Note that the PATTERN
is interesting; it is a regular expression on the symbols defined (DOWN and
UP, STRT is not define so it means it matches any event).

Most CEP solutions have a SQL-like interface.

Best Regards,

Jerry



On Tue, Mar 1, 2016 at 4:44 PM, Reynold Xin  wrote:

> There are definitely pros and cons for Scala vs SQL-style CEP. Scala might
> be more powerful, but the target audience is very different.
>
> How much usage is there for a CEP style SQL syntax in practice? I've never
> seen it coming up so far.
>
>
>
> On Tue, Mar 1, 2016 at 9:35 AM, Alex Kozlov  wrote:
>
>> Looked at the paper: while we can argue on the performance side, I think
>> semantically the Scala pattern matching is much more expressive.  The time
>> will decide.
>>
>> On Tue, Mar 1, 2016 at 9:07 AM, Jerry Lam  wrote:
>>
>>> Hi Alex,
>>>
>>> We went through this path already :) This is the reason we try other
>>> approaches. The recursion makes it very inefficient for some cases.
>>> For details, this paper describes it very well:
>>> https://people.cs.umass.edu/%7Eyanlei/publications/sase-sigmod08.pdf
>>> which is the same paper references in Flink ticket.
>>>
>>> Please let me know if I overlook something. Thank you for sharing this!
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>> On Tue, Mar 1, 2016 at 11:58 AM, Alex Kozlov  wrote:
>>>
 For the purpose of full disclosure, I think Scala offers a much more
 efficient pattern matching paradigm.  Using nPath is like using assembler
 to program distributed systems.  Cannot tell much here today, but the
 pattern would look like:

  | def matchSessions(h: Seq[Session[PageView]], id: String, p:
 Seq[PageView]) :

 Seq[Session[PageView]] = {|   p match {

  | case Nil => Nil

  | case PageView(ts1, "company.com>homepage") ::
 PageView(ts2,

 "company.com>plus>products landing") :: tail if ts2 > ts1 + 600 =>

  |   matchSessions(h, id, tail).+:(new Session(id, p))

  | case _ => matchSessions(h, id, p.tail)

  |   }

 Look for Scala case statements with guards and upcoming book releases.

 http://docs.scala-lang.org/tutorials/tour/pattern-matching

 https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s14.html

 On Tue, Mar 1, 2016 at 8:34 AM, Henri Dubois-Ferriere <
 henr...@gmail.com> wrote:

> fwiw Apache Flink just added CEP. Queries are constructed
> programmatically rather than in SQL, but the underlying functionality is
> similar.
>
> https://issues.apache.org/jira/browse/FLINK-3215
>
> On 1 March 2016 at 08:19, Jerry Lam  wrote:
>
>> Hi Herman,
>>
>> Thank you for your reply!
>> This functionality usually finds its place in financial services
>> which use CEP (complex event processing) for correlation and pattern
>> matching. Many commercial products have this including Oracle and 
>> Teradata
>> Aster Data MR Analytics. I do agree the syntax a bit awkward but after 
>> you
>> understand it, it is actually very compact for expressing something that 
>> is
>> very complex. Esper has this feature partially implemented (
>> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html
>> ).
>>
>> I found the Teradata Analytics documentation best to describe the
>> usage of it. For example (note npath is similar to match_recognize):
>>
>> SELECT last_pageid, MAX( count_page80 )
>>  FROM nPath(
>>  ON ( SELECT * FROM clicks WHERE category >= 0 )

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Reynold Xin
There are definitely pros and cons for Scala vs SQL-style CEP. Scala might
be more powerful, but the target audience is very different.

How much usage is there for a CEP style SQL syntax in practice? I've never
seen it coming up so far.



On Tue, Mar 1, 2016 at 9:35 AM, Alex Kozlov  wrote:

> Looked at the paper: while we can argue on the performance side, I think
> semantically the Scala pattern matching is much more expressive.  The time
> will decide.
>
> On Tue, Mar 1, 2016 at 9:07 AM, Jerry Lam  wrote:
>
>> Hi Alex,
>>
>> We went through this path already :) This is the reason we try other
>> approaches. The recursion makes it very inefficient for some cases.
>> For details, this paper describes it very well:
>> https://people.cs.umass.edu/%7Eyanlei/publications/sase-sigmod08.pdf
>> which is the same paper references in Flink ticket.
>>
>> Please let me know if I overlook something. Thank you for sharing this!
>>
>> Best Regards,
>>
>> Jerry
>>
>> On Tue, Mar 1, 2016 at 11:58 AM, Alex Kozlov  wrote:
>>
>>> For the purpose of full disclosure, I think Scala offers a much more
>>> efficient pattern matching paradigm.  Using nPath is like using assembler
>>> to program distributed systems.  Cannot tell much here today, but the
>>> pattern would look like:
>>>
>>>  | def matchSessions(h: Seq[Session[PageView]], id: String, p:
>>> Seq[PageView]) :
>>>
>>> Seq[Session[PageView]] = {|   p match {
>>>
>>>  | case Nil => Nil
>>>
>>>  | case PageView(ts1, "company.com>homepage") ::
>>> PageView(ts2,
>>>
>>> "company.com>plus>products landing") :: tail if ts2 > ts1 + 600 =>
>>>
>>>  |   matchSessions(h, id, tail).+:(new Session(id, p))
>>>
>>>  | case _ => matchSessions(h, id, p.tail)
>>>
>>>  |   }
>>>
>>> Look for Scala case statements with guards and upcoming book releases.
>>>
>>> http://docs.scala-lang.org/tutorials/tour/pattern-matching
>>>
>>> https://www.safaribooksonline.com/library/view/scala-cookbook/9781449340292/ch03s14.html
>>>
>>> On Tue, Mar 1, 2016 at 8:34 AM, Henri Dubois-Ferriere >> > wrote:
>>>
 fwiw Apache Flink just added CEP. Queries are constructed
 programmatically rather than in SQL, but the underlying functionality is
 similar.

 https://issues.apache.org/jira/browse/FLINK-3215

 On 1 March 2016 at 08:19, Jerry Lam  wrote:

> Hi Herman,
>
> Thank you for your reply!
> This functionality usually finds its place in financial services which
> use CEP (complex event processing) for correlation and pattern matching.
> Many commercial products have this including Oracle and Teradata Aster 
> Data
> MR Analytics. I do agree the syntax a bit awkward but after you understand
> it, it is actually very compact for expressing something that is very
> complex. Esper has this feature partially implemented (
> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html
> ).
>
> I found the Teradata Analytics documentation best to describe the
> usage of it. For example (note npath is similar to match_recognize):
>
> SELECT last_pageid, MAX( count_page80 )
>  FROM nPath(
>  ON ( SELECT * FROM clicks WHERE category >= 0 )
>  PARTITION BY sessionid
>  ORDER BY ts
>  PATTERN ( 'A.(B|C)*' )
>  MODE ( OVERLAPPING )
>  SYMBOLS ( pageid = 50 AS A,
>pageid = 80 AS B,
>pageid <> 80 AND category IN (9,10) AS C )
>  RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid,
>   COUNT ( * OF B ) AS count_page80,
>   COUNT ( * OF ANY ( A,B,C ) ) AS count_any )
>  )
>  WHERE count_any >= 5
>  GROUP BY last_pageid
>  ORDER BY MAX( count_page80 )
>
> The above means:
> Find user click-paths starting at pageid 50 and passing exclusively
> through either pageid 80 or pages in category 9 or category 10. Find the
> pageid of the last page in the path and count the number of times page 80
> was visited. Report the maximum count for each last page, and sort the
> output by the latter. Restrict to paths containing at least 5 pages. 
> Ignore
> pages in the sequence with category < 0.
>
> If this query is written in pure SQL (if possible at all), it requires
> several self-joins. The interesting thing about this feature is that it
> integrates SQL+Streaming+ML in one (perhaps potentially graph too).
>
> Best Regards,
>
> Jerry
>
>
> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier <
> hvanhov...@questtec.nl> wrote:
>
>> Hi Jerry,
>>
>> This is not on any roadmap. I (shortly) browsed through this; and
>> this looks like some sort of a window function with very awkward syntax. 
>> I
>> think spark 

Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
Hi Henri,

Finally, there is a good reason for me to use Flink! Thanks for sharing
this information. This is exactly the solution I'm looking for especially
the ticket references a paper I was reading a week ago. It would be nice if
Flink adds support SQL because this makes business analyst (traders as
well) a way to express it.

Best Regards,

Jerry

On Tue, Mar 1, 2016 at 11:34 AM, Henri Dubois-Ferriere 
wrote:

> fwiw Apache Flink just added CEP. Queries are constructed programmatically
> rather than in SQL, but the underlying functionality is similar.
>
> https://issues.apache.org/jira/browse/FLINK-3215
>
> On 1 March 2016 at 08:19, Jerry Lam  wrote:
>
>> Hi Herman,
>>
>> Thank you for your reply!
>> This functionality usually finds its place in financial services which
>> use CEP (complex event processing) for correlation and pattern matching.
>> Many commercial products have this including Oracle and Teradata Aster Data
>> MR Analytics. I do agree the syntax a bit awkward but after you understand
>> it, it is actually very compact for expressing something that is very
>> complex. Esper has this feature partially implemented (
>> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html
>> ).
>>
>> I found the Teradata Analytics documentation best to describe the usage
>> of it. For example (note npath is similar to match_recognize):
>>
>> SELECT last_pageid, MAX( count_page80 )
>>  FROM nPath(
>>  ON ( SELECT * FROM clicks WHERE category >= 0 )
>>  PARTITION BY sessionid
>>  ORDER BY ts
>>  PATTERN ( 'A.(B|C)*' )
>>  MODE ( OVERLAPPING )
>>  SYMBOLS ( pageid = 50 AS A,
>>pageid = 80 AS B,
>>pageid <> 80 AND category IN (9,10) AS C )
>>  RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid,
>>   COUNT ( * OF B ) AS count_page80,
>>   COUNT ( * OF ANY ( A,B,C ) ) AS count_any )
>>  )
>>  WHERE count_any >= 5
>>  GROUP BY last_pageid
>>  ORDER BY MAX( count_page80 )
>>
>> The above means:
>> Find user click-paths starting at pageid 50 and passing exclusively
>> through either pageid 80 or pages in category 9 or category 10. Find the
>> pageid of the last page in the path and count the number of times page 80
>> was visited. Report the maximum count for each last page, and sort the
>> output by the latter. Restrict to paths containing at least 5 pages. Ignore
>> pages in the sequence with category < 0.
>>
>> If this query is written in pure SQL (if possible at all), it requires
>> several self-joins. The interesting thing about this feature is that it
>> integrates SQL+Streaming+ML in one (perhaps potentially graph too).
>>
>> Best Regards,
>>
>> Jerry
>>
>>
>> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier <
>> hvanhov...@questtec.nl> wrote:
>>
>>> Hi Jerry,
>>>
>>> This is not on any roadmap. I (shortly) browsed through this; and this
>>> looks like some sort of a window function with very awkward syntax. I think
>>> spark provided better constructs for this using dataframes/datasets/nested
>>> data...
>>>
>>> Feel free to submit a PR.
>>>
>>> Kind regards,
>>>
>>> Herman van Hövell
>>>
>>> 2016-03-01 15:16 GMT+01:00 Jerry Lam :
>>>
 Hi Spark developers,

 Will you consider to add support for implementing "Pattern matching in
 sequences of rows"? More specifically, I'm referring to this:
 http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf

 This is a very cool/useful feature to pattern matching over live
 stream/archived data. It is sorted of related to machine learning because
 this is usually used in clickstream analysis or path analysis. Also it is
 related to streaming because of the nature of the processing (time series
 data mostly). It is SQL because there is a good way to express and optimize
 the query.

 Best Regards,

 Jerry

>>>
>>>
>>
>


Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Henri Dubois-Ferriere
fwiw Apache Flink just added CEP. Queries are constructed programmatically
rather than in SQL, but the underlying functionality is similar.

https://issues.apache.org/jira/browse/FLINK-3215

On 1 March 2016 at 08:19, Jerry Lam  wrote:

> Hi Herman,
>
> Thank you for your reply!
> This functionality usually finds its place in financial services which use
> CEP (complex event processing) for correlation and pattern matching. Many
> commercial products have this including Oracle and Teradata Aster Data MR
> Analytics. I do agree the syntax a bit awkward but after you understand it,
> it is actually very compact for expressing something that is very complex.
> Esper has this feature partially implemented (
> http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html
> ).
>
> I found the Teradata Analytics documentation best to describe the usage of
> it. For example (note npath is similar to match_recognize):
>
> SELECT last_pageid, MAX( count_page80 )
>  FROM nPath(
>  ON ( SELECT * FROM clicks WHERE category >= 0 )
>  PARTITION BY sessionid
>  ORDER BY ts
>  PATTERN ( 'A.(B|C)*' )
>  MODE ( OVERLAPPING )
>  SYMBOLS ( pageid = 50 AS A,
>pageid = 80 AS B,
>pageid <> 80 AND category IN (9,10) AS C )
>  RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid,
>   COUNT ( * OF B ) AS count_page80,
>   COUNT ( * OF ANY ( A,B,C ) ) AS count_any )
>  )
>  WHERE count_any >= 5
>  GROUP BY last_pageid
>  ORDER BY MAX( count_page80 )
>
> The above means:
> Find user click-paths starting at pageid 50 and passing exclusively
> through either pageid 80 or pages in category 9 or category 10. Find the
> pageid of the last page in the path and count the number of times page 80
> was visited. Report the maximum count for each last page, and sort the
> output by the latter. Restrict to paths containing at least 5 pages. Ignore
> pages in the sequence with category < 0.
>
> If this query is written in pure SQL (if possible at all), it requires
> several self-joins. The interesting thing about this feature is that it
> integrates SQL+Streaming+ML in one (perhaps potentially graph too).
>
> Best Regards,
>
> Jerry
>
>
> On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier <
> hvanhov...@questtec.nl> wrote:
>
>> Hi Jerry,
>>
>> This is not on any roadmap. I (shortly) browsed through this; and this
>> looks like some sort of a window function with very awkward syntax. I think
>> spark provided better constructs for this using dataframes/datasets/nested
>> data...
>>
>> Feel free to submit a PR.
>>
>> Kind regards,
>>
>> Herman van Hövell
>>
>> 2016-03-01 15:16 GMT+01:00 Jerry Lam :
>>
>>> Hi Spark developers,
>>>
>>> Will you consider to add support for implementing "Pattern matching in
>>> sequences of rows"? More specifically, I'm referring to this:
>>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf
>>>
>>> This is a very cool/useful feature to pattern matching over live
>>> stream/archived data. It is sorted of related to machine learning because
>>> this is usually used in clickstream analysis or path analysis. Also it is
>>> related to streaming because of the nature of the processing (time series
>>> data mostly). It is SQL because there is a good way to express and optimize
>>> the query.
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>
>>
>


Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
Hi Herman,

Thank you for your reply!
This functionality usually finds its place in financial services which use
CEP (complex event processing) for correlation and pattern matching. Many
commercial products have this including Oracle and Teradata Aster Data MR
Analytics. I do agree the syntax a bit awkward but after you understand it,
it is actually very compact for expressing something that is very complex.
Esper has this feature partially implemented (
http://www.espertech.com/esper/release-5.1.0/esper-reference/html/match-recognize.html
).

I found the Teradata Analytics documentation best to describe the usage of
it. For example (note npath is similar to match_recognize):

SELECT last_pageid, MAX( count_page80 )
 FROM nPath(
 ON ( SELECT * FROM clicks WHERE category >= 0 )
 PARTITION BY sessionid
 ORDER BY ts
 PATTERN ( 'A.(B|C)*' )
 MODE ( OVERLAPPING )
 SYMBOLS ( pageid = 50 AS A,
   pageid = 80 AS B,
   pageid <> 80 AND category IN (9,10) AS C )
 RESULT ( LAST ( pageid OF ANY ( A,B,C ) ) AS last_pageid,
  COUNT ( * OF B ) AS count_page80,
  COUNT ( * OF ANY ( A,B,C ) ) AS count_any )
 )
 WHERE count_any >= 5
 GROUP BY last_pageid
 ORDER BY MAX( count_page80 )

The above means:
Find user click-paths starting at pageid 50 and passing exclusively through
either pageid 80 or pages in category 9 or category 10. Find the pageid of
the last page in the path and count the number of times page 80 was
visited. Report the maximum count for each last page, and sort the output
by the latter. Restrict to paths containing at least 5 pages. Ignore pages
in the sequence with category < 0.

If this query is written in pure SQL (if possible at all), it requires
several self-joins. The interesting thing about this feature is that it
integrates SQL+Streaming+ML in one (perhaps potentially graph too).

Best Regards,

Jerry


On Tue, Mar 1, 2016 at 9:39 AM, Herman van Hövell tot Westerflier <
hvanhov...@questtec.nl> wrote:

> Hi Jerry,
>
> This is not on any roadmap. I (shortly) browsed through this; and this
> looks like some sort of a window function with very awkward syntax. I think
> spark provided better constructs for this using dataframes/datasets/nested
> data...
>
> Feel free to submit a PR.
>
> Kind regards,
>
> Herman van Hövell
>
> 2016-03-01 15:16 GMT+01:00 Jerry Lam :
>
>> Hi Spark developers,
>>
>> Will you consider to add support for implementing "Pattern matching in
>> sequences of rows"? More specifically, I'm referring to this:
>> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf
>>
>> This is a very cool/useful feature to pattern matching over live
>> stream/archived data. It is sorted of related to machine learning because
>> this is usually used in clickstream analysis or path analysis. Also it is
>> related to streaming because of the nature of the processing (time series
>> data mostly). It is SQL because there is a good way to express and optimize
>> the query.
>>
>> Best Regards,
>>
>> Jerry
>>
>
>


Re: SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Herman van Hövell tot Westerflier
Hi Jerry,

This is not on any roadmap. I (shortly) browsed through this; and this
looks like some sort of a window function with very awkward syntax. I think
spark provided better constructs for this using dataframes/datasets/nested
data...

Feel free to submit a PR.

Kind regards,

Herman van Hövell

2016-03-01 15:16 GMT+01:00 Jerry Lam :

> Hi Spark developers,
>
> Will you consider to add support for implementing "Pattern matching in
> sequences of rows"? More specifically, I'm referring to this:
> http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf
>
> This is a very cool/useful feature to pattern matching over live
> stream/archived data. It is sorted of related to machine learning because
> this is usually used in clickstream analysis or path analysis. Also it is
> related to streaming because of the nature of the processing (time series
> data mostly). It is SQL because there is a good way to express and optimize
> the query.
>
> Best Regards,
>
> Jerry
>


SPARK-SQL: Pattern Detection on Live Event or Archived Event Data

2016-03-01 Thread Jerry Lam
Hi Spark developers,

Will you consider to add support for implementing "Pattern matching in
sequences of rows"? More specifically, I'm referring to this:
http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf

This is a very cool/useful feature to pattern matching over live
stream/archived data. It is sorted of related to machine learning because
this is usually used in clickstream analysis or path analysis. Also it is
related to streaming because of the nature of the processing (time series
data mostly). It is SQL because there is a good way to express and optimize
the query.

Best Regards,

Jerry