All,

I had three questions:

(1) Is there a timeline for stable Spark 2.0 release?  I know the 'preview'
build is out there, but was curious what the timeline was for full
release. Jira seems to indicate that there should be a release 7/27.

(2)  For 'continuous' datasets there has been a lot of discussion. One item
that came up in tickets was the idea that 'count()' and other functions do
not apply to continuous datasets: https://github.com/apache/spark/pull/12080.
In this case what is the intended procedure to calculate a streaming
statistic based on an interval (e.g. count the number of records in a 2
minute window every 2 minutes)?

(3) In previous releases (1.6.1) the call to DStream / RDD repartition w/ a
number of partitions set to zero silently deletes data.  I have looked in
Jira for a similar issue, but I do not see one.  I would like to address
this (and would likely be willing to go fix it myself).  Should I just
create a ticket?

Thank you,

Bryan Jeffrey

Reply via email to