Hi all,
Sorry if this isn't the right place to ask basic questions, but I'm at the end
of my rope here - please let me know where else I can get help if this isn't
the right place.
I'm trying to continuously read from a Kafka topic and send the number of rows
Spark has received to a metric
Hi all,
I spent some time thinking about the roadmap, and came up with an initial
list:
SPARK-25390: data source V2 API refactoring
SPARK-24252: add catalog support
SPARK-25531: new write APIs for data source v2
SPARK-25190: better operator pushdown API
Streaming rate control API
Custom metrics
OK thanks for clarifying. I guess it is one of major features in streaming
area and nice to add, but also agree it would require huge investigation.
2018년 10월 31일 (수) 오전 8:06, Michael Armbrust 님이 작성:
> Agree. Just curious, could you explain what do you mean by "negation"?
>> Does it mean
>
> Agree. Just curious, could you explain what do you mean by "negation"?
> Does it mean applying retraction on aggregated?
>
Yeah exactly. Our current streaming aggregation assumes that the input is
in append-mode and multiple aggregations break this.
Thanks Micheal for explaining activity on SS as well as giving opinion on
some items!
Replying inline.
2018년 10월 31일 (수) 오전 5:44, Michael Armbrust 님이 작성:
> Thanks for bringing up some possible future directions for streaming. Here
> are some thoughts:
> - I personally view all of the activity
+1
On Tue, Oct 30, 2018 at 4:42 AM Wenchen Fan wrote:
> Thanks for reporting the bug! I'll list it as a known issue for 2.4.0
>
> I'm adding my own +1, since all the known blockers are resolved.
>
> On Tue, Oct 30, 2018 at 2:56 PM Xiao Li wrote:
>
>> Yes, this is not a blocker.
>>
@Michael any update about queryable state?
Stavros
On Tue, Oct 30, 2018 at 10:43 PM, Michael Armbrust
wrote:
> Thanks for bringing up some possible future directions for streaming. Here
> are some thoughts:
> - I personally view all of the activity on Spark SQL also as activity on
>
Thanks for bringing up some possible future directions for streaming. Here
are some thoughts:
- I personally view all of the activity on Spark SQL also as activity on
Structured Streaming. The great thing about building streaming on catalyst
/ tungsten is that continued improvement to these
Hi Reynold,
Thank you for your comments. They are great points.
1) Yes, it is not easy to design the expressive and enough IR. We can
learn concepts from good examples like HyPer, Weld, and others. They are
expressive and not complicated. The detail cannot be captured yet,
2) To introduce
Thanks for reporting the bug! I'll list it as a known issue for 2.4.0
I'm adding my own +1, since all the known blockers are resolved.
On Tue, Oct 30, 2018 at 2:56 PM Xiao Li wrote:
> Yes, this is not a blocker.
> "spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by
>
Hi,
Just ran into it today and wonder whether it's a bug or something I may
have missed before.
scala> spark.version
res21: String = 2.3.2
// that's OK
scala> spark.range(1).write.saveAsTable("t1")
org.apache.spark.sql.AnalysisException: Table `t1` already exists.;
at
Duplicated link problem looks still persistent:
https://issues.apache.org/jira/browse/SPARK-25881
https://issues.apache.org/jira/browse/SPARK-25880
I suspect if there are two places that runs this script. Not a big deal but
people that can fix this are specific.
I am leaving another reminder
Adding more: again, it doesn't mean they're feasible to do. Just a kind of
brainstorming.
* SPARK-20568: Delete files after processing in structured streaming
* There hasn't been consensus regarding supporting this: there were
voices for both YES and NO.
* Support multiple levels of
Yes, this is not a blocker.
"spark.sql.optimizer.nestedSchemaPruning.enabled" is intentionally off by
default. As DB Tsai said, column pruning of nested schema for Parquet
tables is experimental. In this release, we encourage the whole community
to try this new feature but it might have bugs like
+0
I understand that schema pruning is an experimental feature in Spark
2.4, and this can help a lot in read performance as people are trying
to keep the hierarchical data in nested format.
We just found a serious bug---it could fail parquet reader if a nested
field and top level field are
15 matches
Mail list logo