Re: Resolves too old JIRAs as incomplete

2021-05-24 Thread Takeshi Yamamuro
 On Tue, May 25, 2021 at 11:00 AM Hyukjin Kwon wrote: > Awesome, thanks Takeshi! > > 2021년 5월 25일 (화) 오전 10:59, Takeshi Yamamuro 님이 작성: > >> FYI: >> >> Thank you for all the comments. >> I closed 754 tickets in bulk a few minutes ago. >> Please let me know if there is any problem. >> >> Bests,

Re: Resolves too old JIRAs as incomplete

2021-05-24 Thread Takeshi Yamamuro
FYI: Thank you for all the comments. I closed 754 tickets in bulk a few minutes ago. Please let me know if there is any problem. Bests, Takeshi On Fri, May 21, 2021 at 10:29 AM Kent Yao wrote: > +1,thanks Takeshi > > *Kent Yao * > @ Data Science Center, Hangzhou Research Institute, NetEase

Re: About Spark executs sqlscript

2021-05-24 Thread Mich Talebzadeh
Apologies I missed your two points My question: #1 If there are 10 tables or more tables, do I need to read each table into memory though Spark bases on memory compution? Every table will be read as I described above. It is lazy read by Spark. The computation happens when there is an action on

Re: Bridging gap between Spark UI and Code

2021-05-24 Thread mhawes
@Wenchen Fan, understood that the mapping of query plan to application code is very hard. I was wondering if we might be able to instead just handle the mapping from the final physical plan to the stage graph. So for example you’d be able to tell what part of the plan generated which stages. I

Re: About Spark executs sqlscript

2021-05-24 Thread Mich Talebzadeh
Well, Spark to BigQuery API is very efficient in doing what it needs to do. Personally I have never found a JDBC connection to BigQuery that works under all circumstances . In a typical environment you need to set-up your connection variable to BigQuery from Spark. These are my recommended ones

Re: Bridging gap between Spark UI and Code

2021-05-24 Thread Mich Talebzadeh
Plus some operators can be repeated because if a node dies, spark would need to rebuild that state again from RDD lineage. HTH Mich view my Linkedin profile *Disclaimer:* Use it at your own risk. Any and all responsibility for

Re: [VOTE] SPIP: Catalog API for view metadata

2021-05-24 Thread Ryan Blue
I don't think that it makes sense to discuss a different approach in the PR rather than in the vote. Let's discuss this now since that's the purpose of an SPIP. On Mon, May 24, 2021 at 11:22 AM John Zhuge wrote: > Hi everyone, I’d like to start a vote for the ViewCatalog design proposal >

Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-24 Thread Tom Graves
so repartition() would look at some other config (spark.sql.adaptive.advisoryPartitionSizeInBytes) to decide the size to use to partition it on then?  Does it require AQE?  If so what does a repartition() call do if AQE is not enabled? this is essentially a new api so would repartitionBySize

[VOTE] SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Hi everyone, I’d like to start a vote for the ViewCatalog design proposal (SPIP). The proposal is to add a ViewCatalog interface that can be used to load, create, alter, and drop views in DataSourceV2. The full SPIP doc is here:

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Great! I will start a vote thread. On Mon, May 24, 2021 at 10:54 AM Wenchen Fan wrote: > Yea let's move forward first. We can discuss the caching approach > and TableViewCatalog approach during the PR review. > > On Tue, May 25, 2021 at 1:48 AM John Zhuge wrote: > >> Hi everyone, >> >> Is

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread Wenchen Fan
Yea let's move forward first. We can discuss the caching approach and TableViewCatalog approach during the PR review. On Tue, May 25, 2021 at 1:48 AM John Zhuge wrote: > Hi everyone, > > Is there any more discussion before we start a vote on ViewCatalog? With > FunctionCatalog merged, I hope

Re: SPIP: Catalog API for view metadata

2021-05-24 Thread John Zhuge
Hi everyone, Is there any more discussion before we start a vote on ViewCatalog? With FunctionCatalog merged, I hope this feature can complete the offerings of catalog plugins in 3.2. Once approved, I will refresh the WIP PR. Implementation details can be ironed out during review. Thanks, On

Re: About Spark executs sqlscript

2021-05-24 Thread Wenchen Fan
It's not possible to load everything into memory. We should use a big query connector (should be existing already?) and register table B and C and temp views in Spark. On Fri, May 14, 2021 at 8:50 AM bo zhao wrote: > Hi Team, > > I've followed Spark community for several years. This is my first

Re: Purpose of OffsetHolder as a LeafNode?

2021-05-24 Thread Wenchen Fan
It's just an immediate place holder to update the query plan in each micro-batch. On Sat, May 15, 2021 at 10:29 PM Jacek Laskowski wrote: > Hi, > > Just stumbled upon OffsetHolder [1] and am curious why it's a LeafNode? > What logical plan could it be part of? > > [1] >

Re: Secrets store for DSv2

2021-05-24 Thread Wenchen Fan
You can take a look at PartitionReaderFactory. It's created at the driver side, serialized and sent to the executor side. For the write side, there is a similar channel: DataWriterFactory On Wed, May 19, 2021 at 4:37 AM Andrew Melo wrote: > Hello, > > When implementing a DSv2 datasource, where

Re: [Spark Core]: Adding support for size based partition coalescing

2021-05-24 Thread Wenchen Fan
Ideally this should be handled by the underlying data source to produce a reasonably partitioned RDD as the input data. However if we already have a poorly partitioned RDD at hand and want to repartition it properly, I think an extra shuffle is required so that we can know the partition size

Re: Bridging gap between Spark UI and Code

2021-05-24 Thread Wenchen Fan
I believe you can already see each plan change Spark did to your query plan in the debug-level logs. I think it's hard to do in the web UI as keeping all these historical query plans is expensive. Mapping the query plan to your application code is nearly impossible, as so many optimizations can

Re: Bridging gap between Spark UI and Code

2021-05-24 Thread Will Raschkowski
This would be great. At least for logical nodes, would it be possible to re-use the existing Utils.getCallSite to populate a field when nodes are created? I suppose most value would come

[VOTE] Release Spark 3.1.2 (RC1)

2021-05-24 Thread Dongjoon Hyun
Please vote on releasing the following candidate as Apache Spark version 3.1.2. The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 3.1.2 [ ] -1 Do not release this package because ...