Re: RFR: 8283681: Improve ZonedDateTime offset handling

2022-03-25 Thread Richard Startin
On Fri, 25 Mar 2022 12:28:58 GMT, Claes Redestad wrote: > Richard Startin prompted me to have a look at a case where java.time > underperforms relative to joda time > (https://twitter.com/richardstartin/status/1506975932271190017). > > It seems the java.time test of his su

Re: [VOTE] Apache Pinot 0.10.0 RC0

2022-03-24 Thread Richard Startin
+1 - verified sha512 hash - verified signature - verified git hash - verified contents based on git commit hash & the downloaded source code - verified LICENSE, NOTICE are correctly present - compiled the downloaded source code - ran quick start scripts On Mon, Mar 21, 2022 at 7:01 PM Sajjad

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-08 Thread Richard Startin
On Mon, 7 Mar 2022 21:41:05 GMT, Richard Startin wrote: >> Ludovic Henry has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Add UTF-16 benchmarks > > Great to see this taken up. As it’s imple

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v2]

2022-03-07 Thread Richard Startin
t through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are a

Re: [VOTE] Apache Pinot 0.9.3 RC0

2021-12-24 Thread Richard Startin
+1 On Fri, Dec 24, 2021 at 10:17 AM Atri Sharma wrote: > +1 > > On Fri, 24 Dec 2021, 15:41 Xiang Fu, wrote: > >> Hi Pinot Community, >> >> This is a call for a vote to release Apache Pinot 0.9.3. >> >> This is a bug fixing release contains: >> - Upgrade log4j to 2.17.0 to address

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-14 Thread Richard Startin
On Tue, 14 Sep 2021 10:57:17 GMT, Alan Bateman wrote: >>> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I >>> believe this work derives from an unsigned radix sort I implemented on >>> 10/04/2021 >>>

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Sat, 8 May 2021 20:54:48 GMT, iaroslavski wrote: > Sorting: > > - adopt radix sort for sequential and parallel sorts on int/long/float/double > arrays (almost random and length > 6K) > - fix tryMergeRuns() to better handle case when the last run is a single > element > - minor javadoc and

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 10:22:57 GMT, Laurent Bourgès wrote: >> Hi @iaroslavski I'm unconvinced that this work was from 14/06/2020 - I >> believe this work derives from an unsigned radix sort I implemented on >> 10/04/2021 >>

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Fri, 14 May 2021 07:14:27 GMT, Laurent Bourgès wrote: >> So the issue of not skipping passes was my fault in the translation process, >> so not something to worry about, though after [fixing >>

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 14:44:28 GMT, Richard Startin wrote: >> @iaroslavski I would prefer to discuss this in private than here, but my >> argument is that the name `skipByte` came from Laurent's code, and that >> Laurent's code was clearly derived from my own within a fork of

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 20:23:16 GMT, Richard Startin wrote: >> In private correspondence with Vladimir, it was explained that where >> Vladimir's code and Laurent's code are identical, including typos >> ([Vladimir's >> code](https://github.com/ia

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 11:31:49 GMT, iaroslavski wrote: >> Perhaps we can resolve this issue in private - my email address is on my >> profile (or in the commits in `radix-sort-benchmark`)? > > @richardstartin And one more addon: my first version of Radix sort, see my > github

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Thu, 13 May 2021 11:47:58 GMT, Richard Startin wrote: >> @richardstartin And one more addon: my first version of Radix sort, see my >> github https://github.com/iaroslavski/sorting/tree/master/radixsort uses >> another name, like skipBytes, then renamed to passLevel. >

Re: RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort)

2021-09-13 Thread Richard Startin
On Wed, 12 May 2021 12:20:09 GMT, iaroslavski wrote: >> src/java.base/share/classes/java/util/DualPivotQuicksort.java line 47: >> >>> 45: * @author Doug Lea >>> 46: * >>> 47: * @version 2020.06.14 >> >> Vladimir, I would update to 2021.05.06 (+your hash) > > Laurent, the date in this class

Re: Reading data for a particular column-cell with 2 or more values of a same row-key

2017-02-25 Thread Richard Startin
If you operate directly on a Result you only get the latest version of each cell. To get older versions of cells you have a few options: 1) Result::getFamilyMap, if you only want versioned cells from a single family -

Re: Parallel Scanner

2017-02-20 Thread Richard Startin
For a client only solution, have you looked at the RegionLocator interface? It gives you a list of pairs of byte[] (the start and stop keys for each region). You can easily use a ForkJoinPool recursive task or java 8 parallel stream over that list. I implemented a spark RDD to do that and wrote

Re: Doubt

2017-02-14 Thread Richard Startin
I took a look at https://www.linkedin.com/pulse/hbase-read-write-performance-conversation-gaurhari-dass?trk=prof-post Looks like an unattributed copy of

Re: Kryo On Spark 1.6.0

2017-01-10 Thread Richard Startin
Hi Enrico, Only set spark.kryo.registrationRequired if you want to forbid any classes you have not explicitly registered - see http://spark.apache.org/docs/latest/configuration.html. Configuration - Spark 2.0.2 Documentation

Re: ToLocalIterator vs collect

2017-01-05 Thread Richard Startin
Why not do that with spark sql to utilise the executors properly, rather than a sequential filter on the driver. Select * from A left join B on A.fk = B.fk where B.pk is NULL limit k If you were sorting just so you could iterate in order, this might save you a couple of sorts too.

Re: Lease exception

2016-12-21 Thread Richard Startin
> > As far as I understand when scanner.next() is called it will fetch no > of rows as in *hbase.client.scanner.caching. *When this fetching > process takes more than lease period it will close the scanner object. > so this exception occuring? > > > Thanks, > > Rajeshku

Re: Lease exception

2016-12-21 Thread Richard Startin
It means your lease on a region server has expired during a call to resultscanner.next(). This happens on a slow call to next(). You can either embrace it or "fix" it by making sure hbase.rpc.timeout exceeds hbase.regionserver.lease.period. https://richardstartin.com On 21 Dec 2016, at 11:30,

Re: withColumn gives "Can only zip RDDs with same number of elements in each partition" but not with a LIMIT on the dataframe

2016-12-20 Thread Richard Startin
I think limit repartitions your data into a single partition if called as a non terminal operator. Hence zip works after limit because you only have one partition. In practice, I have found joins to be much more applicable than zip because of the strict limitation of identical partitions.

Re: Spark streaming completed batches statistics

2016-12-07 Thread Richard Startin
Ok it looks like I could reconstruct the logic in the Spark UI from the /jobs resource. Thanks. https://richardstartin.com/ From: map reduced <k3t.gi...@gmail.com> Sent: 07 December 2016 19:49 To: Richard Startin Cc: user@spark.apache.org Subject: Re:

Re: Spark streaming completed batches statistics

2016-12-07 Thread Richard Startin
Is there any way to get this information as CSV/JSON? https://docs.databricks.com/_images/CompletedBatches.png [https://docs.databricks.com/_images/CompletedBatches.png] https://richardstartin.com/ From: Richard Startin <richardstar...@outlook.com> Se

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Richard Startin
I've seen the feature work very well. For tuning, you've got: spark.streaming.backpressure.pid.proportional (defaults to 1, non-negative) - weight for response to "error" (change between last batch and this batch) spark.streaming.backpressure.pid.integral (defaults to 0.2, non-negative) -

Re: Back-pressure to Spark Kafka Streaming?

2016-12-05 Thread Richard Startin
I've seen the feature work very well. For tuning, you've got: spark.streaming.backpressure.pid.proportional (defaults to 1, non-negative) - weight for response to "error" (change between last batch and this batch) spark.streaming.backpressure.pid.integral (defaults to 0.2, non-negative) -

Spark streaming completed batches statistics

2016-12-05 Thread Richard Startin
Is there any way to get a more computer friendly version of the completes batches section of the streaming page of the application master? I am very interested in the statistics and am currently screen-scraping... https://richardstartin.com

Re: Livy with Spark

2016-12-05 Thread Richard Startin
There is a great write up on Livy at http://henning.kropponline.de/2016/11/06/ On 5 Dec 2016, at 14:34, Mich Talebzadeh > wrote: Hi, Has there been any experience using Livy with Spark to share multiple Spark contexts? thanks Dr

Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
on of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 28 November 2016 at 16:04, Richard Startin <richardsta

Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
Hi Mich, If you want to store the file whole, you'll need to enforce a 10MB limit to the file size, otherwise you will flush too often (each time the me store fills up) which will slow down writes. Maybe you could deconstruct the xml by extracting columns from the xml using xpath? If the

Re: Hbase 1.2 connection pool in java

2016-11-24 Thread Richard Startin
Hi manjeet, I wrote about a connection pool I implemented at https://richardstartin.com/2016/11/05/hbase-connection-management/ Cheers, Richard Sent from my iPhone > On 24 Nov 2016, at 17:43, Manjeet Singh wrote: > > Hi All > > Cna anyone help me out on How to