Re: [External] : Re: Proposal for Decimal64 and Decimal128 value-based classes

2021-03-31 Thread Douglas Surber
My understanding is that IEEE decimal floating point is intended for currency. 
A large fraction of numeric values stored in databases are currency. It's not 
obvious to me why an e-commerce web site would not want to use Decimal128 to 
represent prices, extensions, taxes, discounts, totals, etc.

> On Mar 31, 2021, at 2:17 PM, Raffaello Giulietti 
>  wrote:
> 
> Hi Douglas,
> 
> yes, different vendors have different limits on the precision, the most 
> extreme probably being PostgreSQL.
> 
> But apart from that, the arithmetic is different.
> 
> A better option is to implement some optimized fixed precision classes like 
> SQLDecimal38 and SQLDecimal65 + a more general variable precision SQLDecimal. 
> But, as I mentioned, this is something different than Decimal.
> 
> 
> Greetings
> Raffaello
> 
> 
> 
> On 2021-03-31 22:53, Douglas Surber wrote:
>> Understood. The problem is that right now the only appropriate type for 
>> non-integer SQL numbers is BigDecimal. It's too big and too slow and lots of 
>> users avoid it.
>> Decimal128 supports 34 significant digits. The max precision of SQL numeric 
>> types varies from vendor to vendor. In SQL Server it is 38. In MySQL it is 
>> 65. So there are a huge range of values representable in SQL that are not 
>> representable in Decimal128. BUT, for the vast majority of applications that 
>> might be tempted to use Decimal128, those non-representable numbers don't 
>> occur. Currency amounts exceeding 34 decimal digits of precision are an 
>> almost non-existent minority.
>> Very few apps will pay the price of using BigDecimal even though it would 
>> support huge numbers exactly. Instead they find workarounds that are more 
>> efficient. Decimal128 would be a substantial improvement for those apps.
>> Douglas
>>> On Mar 31, 2021, at 1:13 PM, Raffaello Giulietti 
>>>  wrote:
>>> 
>>> Hi,
>>> 
>>> I think there's a misunderstanding about the nature of IEEE 754 Decimal 
>>> (e.g., Decimal64), the subject of this thread, and the nature of SQL 
>>> DECIMAL(p, s).
>>> 
>>> SQL DECIMAL(p, s) represent *fixed* point decimal numbers, with an overall 
>>> maximum precision p and a scale s, the number of digits to the right of the 
>>> decimal point (both parameters can be selected freely inside some ranges). 
>>> For example, DECIMAL(2, 1) can represent only the values
>>>-9.9, -9.8, ..., 9.8, 9.9
>>> and that's it.
>>> Thus, the sum 6.6 + 7.7 overflows, as well as the product 2.9 * 4.
>>> 
>>> IEEE decimals are *floating* point decimal numbers. A hypothetical decimal 
>>> of precision 2 can represent values of the form c*10^q, where integer c 
>>> meets |c| < 100 (that is, max two digits) and integer q is limited in some 
>>> range. It covers the values above and much more, for example, 0.012 
>>> (=12*10^(-3)) and -3.4E2 (=-34*10^1).
>>> The sum 6.6 + 7.7 produces 14 because the mathematical result 14.3 is 
>>> rounded to the closest number of precision 2 (assuming 
>>> RoundingMode.HALF_EVEN). By the same token, the product 2.9 * 4 produces 
>>> 12, which is 11.6 rounded to 2 digits.
>>> But really, the position of the decimal point is floating.
>>> 
>>> IEEE decimals and SQL decimals are fundamentally different and ave 
>>> different arithmetic, so I wouldn't recommend using the proposed classes 
>>> for JDBC.
>>> 
>>> On the positive side, SQL decimals, are easier to implement if the maximum 
>>> allowed p in DECIMAL(p, s) is reasonable, say 38. But that's another topic.
>>> 
>>> 
>>> Greetings
>>> Raffaello



Re: [External] : Re: Proposal for Decimal64 and Decimal128 value-based classes

2021-03-31 Thread Douglas Surber
Understood. The problem is that right now the only appropriate type for 
non-integer SQL numbers is BigDecimal. It's too big and too slow and lots of 
users avoid it. 

Decimal128 supports 34 significant digits. The max precision of SQL numeric 
types varies from vendor to vendor. In SQL Server it is 38. In MySQL it is 65. 
So there are a huge range of values representable in SQL that are not 
representable in Decimal128. BUT, for the vast majority of applications that 
might be tempted to use Decimal128, those non-representable numbers don't 
occur. Currency amounts exceeding 34 decimal digits of precision are an almost 
non-existent minority.

Very few apps will pay the price of using BigDecimal even though it would 
support huge numbers exactly. Instead they find workarounds that are more 
efficient. Decimal128 would be a substantial improvement for those apps.

Douglas

> On Mar 31, 2021, at 1:13 PM, Raffaello Giulietti 
>  wrote:
> 
> Hi,
> 
> I think there's a misunderstanding about the nature of IEEE 754 Decimal 
> (e.g., Decimal64), the subject of this thread, and the nature of SQL 
> DECIMAL(p, s).
> 
> SQL DECIMAL(p, s) represent *fixed* point decimal numbers, with an overall 
> maximum precision p and a scale s, the number of digits to the right of the 
> decimal point (both parameters can be selected freely inside some ranges). 
> For example, DECIMAL(2, 1) can represent only the values
>-9.9, -9.8, ..., 9.8, 9.9
> and that's it.
> Thus, the sum 6.6 + 7.7 overflows, as well as the product 2.9 * 4.
> 
> IEEE decimals are *floating* point decimal numbers. A hypothetical decimal of 
> precision 2 can represent values of the form c*10^q, where integer c meets 
> |c| < 100 (that is, max two digits) and integer q is limited in some range. 
> It covers the values above and much more, for example, 0.012 (=12*10^(-3)) 
> and -3.4E2 (=-34*10^1).
> The sum 6.6 + 7.7 produces 14 because the mathematical result 14.3 is rounded 
> to the closest number of precision 2 (assuming RoundingMode.HALF_EVEN). By 
> the same token, the product 2.9 * 4 produces 12, which is 11.6 rounded to 2 
> digits.
> But really, the position of the decimal point is floating.
> 
> IEEE decimals and SQL decimals are fundamentally different and ave different 
> arithmetic, so I wouldn't recommend using the proposed classes for JDBC.
> 
> On the positive side, SQL decimals, are easier to implement if the maximum 
> allowed p in DECIMAL(p, s) is reasonable, say 38. But that's another topic.
> 
> 
> Greetings
> Raffaello



Re: Proposal for Decimal64 and Decimal128 value-based classes

2021-03-31 Thread Douglas Surber
I'm sure this would be a huge disruption, but I'll throw it out anyway. I'd be 
perfectly happy if assigning null to a Decimal64/128 container was not allowed, 
whether it is a reference or a value.

I haven't followed the progress of Valhalla closely. It would be reasonable to 
delay Decimal64/128 until Valhalla so long as that isn't more than a very few 
cycles. My concern is that Valhalla is a challenging project. I would not want 
Decimal64/128 to get hung up because Valhalla is delayed or even worse canceled.

Douglas

> On Mar 31, 2021, at 8:01 AM, Maurizio Cimadamore 
>  wrote:
> 
> 
> On 31/03/2021 15:23, Douglas Surber wrote:
>> Rather than waiting on Valhala I would prefer that this project be fast 
>> tracked and added to OpenJDK ASAP.
> 
> There is a catch here.
> 
> While in principle, we can add these as value-based classes, and migrate to 
> Valhalla later, there is a biggie difference between doing it before/after.
> 
> When it comes to "migrated" primitive classes, there is a choice in how to 
> interpret the "old" utterances of the class name. Let's say that class Foo is 
> migrated to be a primitive class; does that mean that all uses of Foo in 
> existing program will automatically get flattening? Or will references to Foo 
> be interpreted in a conservative fashion, so  as to allow the same operations 
> as before? One important difference in semantics is assignment to `null` 
> which is prohibited under flattened semantics, but allowed under "indirect" 
> (or by reference, if you will) semantics.
> 
> In other words, under the current plan, if Decimal128 is added now and 
> migrated later, utterances of Decimal128 will behave like they used to 
> pre-Valhalla, and, to take advantage of flattening you would need to opt-in 
> with some keyword (e.g. Decimal128.val).
> 
> To me this is kind of a strong argument against going with these classes now 
> (as much as I understand how useful they'd be even w/o Valhalla) - and 
> preserving the "good" name (Decimal128) for the flattened case seems worth, 
> IMHO, waiting few more cycles.
> 
> Maurizio
> 



Re: Proposal for Decimal64 and Decimal128 value-based classes

2021-03-31 Thread Douglas Surber
+1

JDBC would support this immediately. All it would take is the addition of a 
couple of lines in some appendices to require that conforming implementations 
of getObject(int, Class), setObject(int, Object, SQLType), etc support 
Decimal64 and Decimal128. No change to the API required. Driver vendors could 
add this support the instant the types are available, no need to wait for a 
change in the JDBC spec.

This would be a huge win for many Java apps. A large fraction of Java apps deal 
with money in some form. Binary floats are inappropriate for money and 
BigDecimal is too big and too slow.

Rather than waiting on Valhala I would prefer that this project be fast tracked 
and added to OpenJDK ASAP.

Thanks for doing this.

Douglas

On Mar 30, 2021, at 10:10 PM, 
core-libs-dev-requ...@openjdk.java.net
 wrote:

Date: Tue, 30 Mar 2021 22:12:32 -0400
From: Brian Goetz mailto:brian.go...@oracle.com>>
To: Raffaello Giulietti 
mailto:raffaello.giulie...@gmail.com>>, Paul 
Sandoz
mailto:paul.san...@oracle.com>>
Cc: core-libs-dev 
mailto:core-libs-dev@openjdk.java.net>>
Subject: Re: Proposal for Decimal64 and Decimal128 value-based classes
Message-ID: 
<64334a24-0e4c-57b8-b666-447ca3508...@oracle.com>
Content-Type: text/plain; charset=utf-8; format=flowed

They'll find a natural home in JDBC, since SQL has a native decimal type.

On 3/30/2021 7:05 PM, Raffaello Giulietti wrote:

As far as I can tell, scientific computation will make use of binary
floating point numbers for a long time. Decimal floating point numbers
are still limited to biz and fin applications.




Re: RFR 8251989: Hex formatting and parsing

2020-09-03 Thread Douglas Surber
> Are there current hex formatting in code you are aware of that already uses
> prefixes or suffixes on each byte?
> 

Java. My example. 

>   { 0x00, 0x01, 0x02, 0xOD, 0x0E, 0x0F }


Douglas

> On Sep 3, 2020, at 2:24 PM, core-libs-dev-requ...@openjdk.java.net wrote:
> 
> Date: Thu, 3 Sep 2020 16:13:49 -0400
> From: Roger Riggs mailto:roger.ri...@oracle.com>>
> To: core-libs-dev@openjdk.java.net 
> Subject: Re: RFR 8251989: Hex formatting and parsing
> Message-ID: <7135b593-e48e-6931-4577-eb4350e51...@oracle.com 
> >
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> Hi Douglas,
> 
> I think you answered your own question at the outset. (with the API as 
> it stands).
> 
> Are there current hex formatting in code you are aware of that already uses
> prefixes or suffixes on each byte?
> 
> I'm evauating whether there is consensus to change the semantics of 
> prefix and suffix.
> 
> Thanks, Roger
> 
> 



Re: RFR 8251989: Hex formatting and parsing

2020-09-02 Thread Douglas Surber
I still want to know how to format a byte sequence where each byte has a prefix 
(or suffix).

Eg.

{ 0x00, 0x01, 0x02, 0xOD, 0x0E, 0x0F }

Douglas

> On Sep 2, 2020, at 2:26 PM, core-libs-dev-requ...@openjdk.java.net wrote:
> 
> Date: Wed, 2 Sep 2020 16:26:07 -0400
> From: Roger Riggs mailto:roger.ri...@oracle.com>>
> To: core-libs-dev  >
> Subject: RFR 8251989: Hex formatting and parsing
> Message-ID: <180eca84-19fe-2dc4-9e57-1d05328e1...@oracle.com 
> >
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> Please review the updated API in the Javadoc.
> http://cr.openjdk.java.net/~rriggs/hex-formatter/java.base/java/util/Hex.html 
> 
> 
> A few offline contacts encouaged me to explore expanding the support
> for formatting and parsing of primitive types.
> Adding formatting toHex methods and [arsing fromHex methods makes the
> API easier to use.
> 
> For example, there are a number of protocols that need to escape char 
> values
> as hex. For symmetry across the primitive types, byte, char, short, int, 
> and long
> methods are added.
> 
> The handling of invalid digits was changed to consistently throw
> IllegalArgumentException; it is unlikely that encoding and decoding
> run into invalid characters and the previous technique requiring the caller
> to check the value for negative numbers is error prone.
> 
> There is some danger of proliferating methods yet, so not all combinations
> are included.? Feedback welcome.
> 
> With the API still shifting, the implemention is not ready to review.
> 
> Thanks, Roger
> 



Re: RFR 8251989: Hex formatter and parser utility

2020-08-27 Thread Douglas Surber
The meaning of prefix and suffix is not specified in formatter​(boolean 
uppercase, String delimiter,String prefix, String suffix). It isn't specified 
whether they precede and follow the entire formatted value or each byte. The 
class comment clarifies but I shouldn't have to go there to discover this.

I was surprised at the meaning for prefix and suffix. The seem pointless to me. 
It is trivial to enclose the entire formatted value with a prefix and suffix 
without using these arguments. If they were prefix and suffix for each 
individual byte, that would be much more useful. For example, how can I format 
a byte sequence like this?

0x00 0x01 0x02 0x0d 0x0e 0x0f

format(false, " 0x", "0x", "") 

doesn't work because an empty byte array would be

0x

instead of an empty string.

Douglas

> On Aug 27, 2020, at 4:55 AM, core-libs-dev-requ...@openjdk.java.net wrote:
> 
> Message: 1
> Date: Wed, 26 Aug 2020 21:34:47 -0400
> From: Roger Riggs mailto:roger.ri...@oracle.com>>
> To: core-libs-dev  >
> Subject: RFR 8251989: Hex formatter and parser utility
> Message-ID: <6378b60b-7a45-d8b0-5ebd-3d3bf9144...@oracle.com 
> >
> Content-Type: text/plain; charset=utf-8; format=flowed
> 
> Please review updates to the formatting and parsing API based on the 
> last round of comments.
> There are many changes, so it may be useful to read it as a fresh draft.
> 
> ?- Rename classes: Encoder -> Formatter; Decoder -> Parser
> ?- Rename methods: encode -> format; decode -> parse, etc.
> ?- Rename factory methods to match
> ?- Added a factory method and re-arrange arguments to make it more 
> convenient
> ?? to create uppercase formatters based on the existing uses.
> ?- The implementation has been updated based on the suggestions and API 
> changes
> 
> The webrev for applying the API to the security classes will be updated 
> when the API settles down.
> 
> JavaDoc:
> http://cr.openjdk.java.net/~rriggs/hex-formatter/java.base/java/util/Hex.html 
> 
>  
> 
> 
> Webrev:
> ? http://cr.openjdk.java.net/~rriggs/webrev-hex-formatter-8251989/ 
> 
> 
> CSR:
> https://bugs.openjdk.java.net/browse/JDK-8251991 
> 
> 
> p.s.
> The previous (encoder/decoder) javadoc has been renamed to:
> ?? http://cr.openjdk.java.net/~rriggs/hex-encoder-javadoc/ 
> 
> 
> 



Re: 8215441: Increase uniformity of the distribution of BigIntegers constructed by BigInteger(int, Random)

2018-12-20 Thread Douglas Surber
I wrote the following simple test case to look at the uniformity of the 
distribution. I don't see any problem running it up to 4096 buckets. Admittedly 
 I did not do any statistical tests on the buckets but by eye they look 
uniformly distributed.


  public static void main(String[] args) throws Throwable {
SecureRandom sr = SecureRandom.getInstance("SHA1PRNG");
final int nBits = 4096;
final int nBuckets = 128;
final int count = 10;
Pair[] buckets = new Pair[nBuckets];
BigInteger max = BigInteger.TWO.pow(nBits);
BigInteger limit = max;
BigInteger step = limit.divide(BigInteger.valueOf(buckets.length));
for (int i = 0; i < buckets.length; i++) buckets[i] = new Pair(limit = 
limit.subtract(step));
for (int i = 0; i < count; ++i) {
  // biased towards high numbers. never chooses below a high limit 
  BigInteger number = new BigInteger(nBits, sr);
  int j;
  for (j = 0; buckets[j].limit.compareTo(number) > 0; j++) {}
  buckets[j].count++;
}
for (int i = buckets.length; i > 0; i--) 
System.out.print(buckets[i-1].count + (i%8==0 ? "\n" : "\t"));
System.out.println();
  }


Douglas

java.text.SimpleDateFormat.parse should recognize America/Los_Angeles

2015-03-04 Thread Douglas Surber

java.text.SimpleDateFormat.parse does not recognize time zone ids.

   new SimpleDateFormat(-MM-dd HH:mm:ss 
).parse(2015-03-03 09:25:00 America/Los_Angeles)


does not recognize America/Los_Angeles as a time zone. 
America/Los_Angeles is a time zone id and the  format looks 
for time zone names such as PST and Pacific Standard Time. None 
of the various time zone formats (or at least none that I have tried) 
will recognize a time zone id, eg America/Los_Angeles.


The code in question is in java.text.SimpleDateFormat.

  private int matchZoneString(String text, int start, String[] 
zoneNames) {

  for (int i = 1; i = 4; ++i) {
  // Checking long and short zones [1  2],
  // and long and short daylight [3  4].
  String zoneName = zoneNames[i];
  if (text.regionMatches(true, start,
 zoneName, 0, zoneName.length())) {
  return i;
  }
  }
  return -1;
  }

The argument zoneNames is a 5 element array. Element 0 is a time zone 
id such as America/Los_Angeles and the next four are long and 
short, standard and daylight time names such as PST and Pacific 
Daylight Time. This array comes from 
java.text.DateFormatSymbols.getZoneStringsWrapper, which returns an 
array of such String[]s.


Changing the initial index in the for loop from 1 to 0 would include 
the zone id in the set of Strings to compare and thus would match 
America/Los_Angeles or Europe/London. It would also be necessary 
to modify SimpleDateFormat.subParseZoneString to correctly set 
useSameName. A simple change would be to test if nameIndex == 0 as 
the zone id is the same for standard and daylight time though this 
might not be correct as I haven't fully investigated how useSameName 
is used.


What exactly the various format character sequences are supposed to 
recognize is sufficiently vague that no change to the spec appears 
necessary, thought it might be beneficial.


A separate issue would be to consider the performance of doing 
regionMatches over this large String[][].


Douglas



Re: More memory-efficient internal representation for Strings: call for more data

2014-12-02 Thread Douglas Surber
String construction is a big performance issue for JDBC drivers. Most 
queries return some number of Strings. The overwhelming majority of 
those Strings will be short lived. The cost of constructing these 
Strings from network bytes is a large fraction of total execution 
time. Any increase in the cost of constructing a String will far out 
weigh any reduction in memory use, at least for query results.


All of the proposed compression methods require an additional scan of 
the entire string. That's exactly the wrong direction. Something like 
the following pseudo-code is common inside a driver.


  {
char[] c = new char[n];
for (i = 0; i  n; i++) c[i] = charSource.next();
return new String(c);
  }

The array copy inside the String constructor is a significant 
fraction of JDBC driver execution time. Adding an additional scan on 
top of it is making things worse regardless of the transient benefit 
of more compact storage. In the case of a query result the String 
will be likely never be promoted out of new space; the benefit of 
compression would be minimal.


I don't dispute that Strings occupy a significant fraction of the 
heap or that a lot of those bytes are zero. And I certainly agree 
that reducing memory footprint is valuable, but any worsening of 
String construction time will likely be a problem.


Douglas

At 02:13 PM 12/2/2014, core-libs-dev-requ...@openjdk.java.net wrote:

Date: Wed, 03 Dec 2014 00:59:10 +0300
From: Aleksey Shipilev aleksey.shipi...@oracle.com
To: Java Core Libs core-libs-dev@openjdk.java.net
Cc: charlie hunt charlie.h...@oracle.com
Subject: More memory-efficient internal representation for Strings:
call formore data
Message-ID: 547e362e.5010...@oracle.com
Content-Type: text/plain; charset=utf-8

Hi,

As you may already know, we are looking into more memory efficient
representation for Strings:
 https://bugs.openjdk.java.net/browse/JDK-8054307

As part of preliminary performance work for this JEP, we have to 
collect

the empirical data on usual characteristics of Strings and char[]-s
normal applications have, as well as figure out the early estimates 
for
the improvements based on that data. What we have so far is written 
up here:


http://cr.openjdk.java.net/~shade/density/string-density-report.pdf

We would appreciate if people who are interested in this JEP can 
provide
the additional data on their applications. It is double-interesting 
to

have the data for the applications that process String data outside
Latin1 plane. Our current data says these cases are rather rare. 
Please
read the current report draft, and try to process your own heap 
dumps

using the instructions in the Appendix.

Thanks,
-Aleksey.




Re: More memory-efficient internal representation for Strings: call for more data

2014-12-02 Thread Douglas Surber
The most common operation on most Strings in query results is to do 
nothing. Just construct the String, hold onto it while the rest of 
the transaction completes, then drop it on the floor. Probably the 
next most common is to encode the chars to write them to an 
OutputStream or send them back to the database. I'd be curious how a 
compact representation would help those operations.


SPECjEnterprise is a widely used standard benchmark. It probably uses 
mostly (or even entirely) ASCII characters so it's not representative 
of many customers.


My definition of sane limits might be different than yours. As far 
as I'm concerned String construction is already too slow and should 
be made faster by eliminating the char[] copy when possible.


Douglas

At 03:47 PM 12/2/2014, Aleksey Shipilev wrote:

Hi Douglas,

On 12/03/2014 02:24 AM, Douglas Surber wrote:
 String construction is a big performance issue for JDBC drivers. 
Most
 queries return some number of Strings. The overwhelming majority 
of

 those Strings will be short lived. The cost of constructing these
 Strings from network bytes is a large fraction of total execution 
time.
 Any increase in the cost of constructing a String will far out 
weigh any

 reduction in memory use, at least for query results.

You will also have to take into the account that shorter 
(compressed)

Strings allow for more efficient operations on them. This is not to
mention the GC costs are also usually hidden from the naive
performance estimations: even though you can perceive the mutator is
spending more time doing work, that might be offset by easier job 
for GC.


 All of the proposed compression methods require an additional 
scan of
 the entire string. That's exactly the wrong direction. Something 
like

 the following pseudo-code is common inside a driver.

   {
 char[] c = new char[n];
 for (i = 0; i  n; i++) c[i] = charSource.next();
 return new String(c);
   }

Good to know. We will be assessing the String(char[]) construction
performance in the course of this performance work. What would you 
say

is a characteristic high-level benchmark for the scenario you are
describing?

 The array copy inside the String constructor is a significant 
fraction
 of JDBC driver execution time. Adding an additional scan on top 
of it is
 making things worse regardless of the transient benefit of more 
compact
 storage. In the case of a query result the String will be likely 
never
 be promoted out of new space; the benefit of compression would be 
minimal.


It's hard to say at this point. We want to understand what footprint
improvements we are talking about. I agree that if cost-benefit 
analysis
will say the performance is degrading beyond the sane limits even if 
we
are happy with memory savings, there is little reason to push this 
into

the general JDK.

Thanks,
-Aleksey