Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
Very subtle but someone might take “We will drop Python 2 support in a future release in 2020” To mean any / first release in 2020. Whereas the next statement indicates patch release is not included in above. Might help reorder the items or clarify the wording.

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Georg Heiler
Bucketing will only help you with joins. And these usually happen on a key. You mentioned that there is no such key in your data. If just want to search through large quantities of data sorting an partitioning by time is left. Rishi Shah schrieb am Sa. 1. Juni 2019 um 05:57: > Thanks much for

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Rishi Shah
Thanks much for your input Gourav, Silvio. I have about 10TB of data, which gets stored daily. There's no qualifying column for partitioning, which makes querying this table super slow. So I wanted to sort the results before storing them daily. This is why I was thinking to use bucketing and

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread shane knapp
+1000 ;) On Sat, Jun 1, 2019 at 6:53 AM Denny Lee wrote: > +1 > > On Fri, May 31, 2019 at 17:58 Holden Karau wrote: > >> +1 >> >> On Fri, May 31, 2019 at 5:41 PM Bryan Cutler wrote: >> >>> +1 and the draft sounds good >>> >>> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: >>> Here

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Denny Lee
+1 On Fri, May 31, 2019 at 17:58 Holden Karau wrote: > +1 > > On Fri, May 31, 2019 at 5:41 PM Bryan Cutler wrote: > >> +1 and the draft sounds good >> >> On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: >> >>> Here is the draft announcement: >>> >>> === >>> Plan for dropping Python 2

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Holden Karau
+1 On Fri, May 31, 2019 at 5:41 PM Bryan Cutler wrote: > +1 and the draft sounds good > > On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: > >> Here is the draft announcement: >> >> === >> Plan for dropping Python 2 support >> >> As many of you already knew, Python core development team and

Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Bryan Cutler
+1 and the draft sounds good On Thu, May 30, 2019, 11:32 AM Xiangrui Meng wrote: > Here is the draft announcement: > > === > Plan for dropping Python 2 support > > As many of you already knew, Python core development team and many > utilized Python packages like Pandas and NumPy will drop

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Silvio Fiorito
Spark does allow appending new files to bucketed tables. When the data is read in, Spark will combine the multiple files belonging to the same buckets into the same partitions. Having said that, you need to be very careful with bucketing especially as you’re appending to avoid generating lots

java.util.NoSuchElementException: Columns not found

2019-05-31 Thread Shyam P
Trying to save a sample data into C* table I am getting below error : *java.util.NoSuchElementException: Columns not found in table abc.company_vals: companyId, companyName* Though I have all the columns and re checked them again and again. I dont see any issue with columns. I am using

Re: dynamic allocation in spark-shell

2019-05-31 Thread Deepak Sharma
You can start spark-shell with these properties: --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.initialExecutors=2 --conf spark.dynamicAllocation.minExecutors=2 --conf spark.dynamicAllocation.maxExecutors=5 On Fri, May 31, 2019 at 5:30 AM Qian He wrote: > Sometimes

Re: [pyspark 2.3+] Bucketing with sort - incremental data load?

2019-05-31 Thread Gourav Sengupta
Hi Rishi, I think that if you are using sorting and then appending data locally there will no need to bucket data and you are good with external tables that way. Regards, Gourav On Fri, May 31, 2019 at 3:43 AM Rishi Shah wrote: > Hi All, > > Can we use bucketing with sorting functionality to