Re: Joining DataFrames derived from the same source yields confusing/incorrect results

2018-08-29 Thread Tomasz Gawęda
Hi, Tweet linked on the issue suggests some Spark error, but I didn't dig into it to find root cause. At least, it's quite confusing behaviour Pozdrawiam/Best regards, Tomek 29.08.2018 6:44 PM Nicholas Chammas napisał(a): Dunno if I made a silly mistake, but I wanted to bring some attention

Re: code freeze and branch cut for Apache Spark 2.4

2018-07-31 Thread Tomasz Gawęda
Hi, what is the status of Continuous Processing + Aggregations? As far as I remember, Jose Torres said it should  be easy to perform aggregations if coalesce(1) work. IIRC it's already merged to master. Is this work in progress? If yes, it would be great to have full aggregation/join support

Re: Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
, Wenchen On Tue, May 15, 2018 at 8:33 PM, Tomasz Gawęda <tomasz.gaw...@outlook.com<mailto:tomasz.gaw...@outlook.com>> wrote: Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL

Preventing predicate pushdown

2018-05-15 Thread Tomasz Gawęda
Hi, while working with JDBC datasource I saw that many "or" clauses with non-equality operators causes huge performance degradation of SQL query to database (DB2). For example: val df = spark.read.format("jdbc").(other options to parallelize load).load() df.where(s"(date1 > $param1 and

Re: Dataset.localCheckpoint?

2018-01-23 Thread Tomasz Gawęda
Hi, sorry again, I was wrong - it was added in 2.3 by Fernando Pereira Pozdrawiam / Best regards, Tomek Gawęda On 2018-01-22 19:32, Tomasz Gawęda wrote: > Hi, > > Today I saw that there is no localCheckpoint() function in Dataset. Is > there any reason for that? Checkpointing

Dataset.localCheckpoint?

2018-01-22 Thread Tomasz Gawęda
Hi, Today I saw that there is no localCheckpoint() function in Dataset. Is there any reason for that? Checkpointing can truncate logical plans, but in some cases it's quite expensive to save whole Dataset on disk. Is there any workaround for this? Pozdrawiam / Best regards, Tomek Gawęda

Re: Broken SQL Visualization?

2018-01-17 Thread Tomasz Gawęda
:07 AM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: Did you include any picture ? Looks like the picture didn't go thru. Please use third party site. Thanks Original message ---- From: Tomasz Gawęda <tomasz.gaw...@outlook.com<mailto:tomasz.

Broken SQL Visualization?

2018-01-15 Thread Tomasz Gawęda
Hi, today I have updated my test cluster to current Spark master, after that my SQL Visualization page started to crash with following error in JS: [cid:part1.DB2FB812.D25D60D1@outlook.com] Screenshot was cut for readability and to hide internal server names ;) It may be caused by upgrade or

SQL Visualization for cached Dataset

2018-01-02 Thread Tomasz Gawęda
Hi, Recently I had to optimize few Apache Spark SQL queries. Some of the Datasets were reused, so they were cached. However after caching I don't see SQL Visualization for the cached Dataset in Spark UI - I see only InMemoryRelation node. Explain result at the bottom of the page still has

Odp.: Spark Improvement Proposals

2016-10-17 Thread Tomasz Gawęda
ys) Pozdrawiam / Best regards, Tomasz Od: Cody Koeninger <c...@koeninger.org> Wysłane: 17 października 2016 16:46 Do: Debasish Das DW: Tomasz Gawęda; dev@spark.apache.org Temat: Re: Spark Improvement Proposals I think narrowly focusing on Flink or benchmarks is missing

Re: Spark Improvement Proposals

2016-10-16 Thread Tomasz Gawęda
Hi everyone, I'm quite late with my answer, but I think my suggestions may help a little bit. :) Many technical and organizational topics were mentioned, but I want to focus on these negative posts about Spark and about "haters" I really like Spark. Easy of use, speed, very good community -

Real time streaming in Spark

2016-08-29 Thread Tomasz Gawęda
Hi everyone, I wonder if there are plans to implement real time streaming in Spark. I see that in Spark 2.0 Trigger can have more implementations than ProcessingTime. In my opinion Real Time streaming (so reaction on every event - like continous queries in Apache Ignite) will be very useful

Odp.: spark2.0 can't run SqlNetworkWordCount

2016-07-25 Thread Tomasz Gawęda
Hi, Please change Scala version to 2.11. As far as I know, Spark packages are now build with Scala 2.11 and I've got other - 2.10 - version Od: kevin Wysłane: 25 lipca 2016 11:33 Do: user.spark; dev.spark Temat: spark2.0 can't run