[GitHub] [commons-net] nikunjb commented on issue #41: [NET-405] Support for IPv6 in SubnetUtils
nikunjb commented on issue #41: [NET-405] Support for IPv6 in SubnetUtils URL: https://github.com/apache/commons-net/pull/41#issuecomment-503888146 +1 to get this merged please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-pool] coveralls commented on issue #21: Adding power support
coveralls commented on issue #21: Adding power support URL: https://github.com/apache/commons-pool/pull/21#issuecomment-503885206 [![Coverage Status](https://coveralls.io/builds/24100344/badge)](https://coveralls.io/builds/24100344) Coverage increased (+0.03%) to 85.213% when pulling **261ade4f4ecf2a2af861fd21419a04aca9491b09 on ghatwala:master** into **a2ebe6c4167ad6ccd6f7b1b6782eb2a925ecf679 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-csv] coveralls commented on issue #45: adding power support
coveralls commented on issue #45: adding power support URL: https://github.com/apache/commons-csv/pull/45#issuecomment-503882889 [![Coverage Status](https://coveralls.io/builds/24100275/badge)](https://coveralls.io/builds/24100275) Coverage remained the same at 92.829% when pulling **8f94949f8fb9202f202beedbd69a5668427d1485 on ghatwala:master** into **7754cd4c84299e72043067501d2965f55e7ff769 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-csv] ghatwala opened a new pull request #45: adding power support
ghatwala opened a new pull request #45: adding power support URL: https://github.com/apache/commons-csv/pull/45 to ensure this repo builds on ppc64le , added support here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-pool] ghatwala opened a new pull request #21: Adding power support
ghatwala opened a new pull request #21: Adding power support URL: https://github.com/apache/commons-pool/pull/21 Ensuring this repo builds on ppc64le added the support here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868055#comment-16868055 ] Alex D Herbert commented on RNG-104: The array can be filled in blocks and the last block is incomplete. Here's some example code I used: {code:java} private static final ReentrantLock UNFAIR_LOCK = new ReentrantLock(false); private static void nextInt(Lock lock, UniformRandomProvider rng, int[] array, int start, int end) { lock.lock(); try { for (int i = start; i < end; i++) { array[i] = rng.nextInt(); } } finally { lock.unlock(); } } @Benchmark @Threads(4) public int[] Threads4_createIntArraySeedBlocks_UnfairLock(SeedRandomSources sources, TestSizes sizes) { final UniformRandomProvider rng = sources.getGenerator(); final int[] seed = new int[sizes.getSize()]; for (int i = 0; i < seed.length; i += sizes.getBlockSize()) { nextInt(UNFAIR_LOCK, rng, seed, i, Math.min(i + sizes.getBlockSize(), seed.length)); } return seed; } {code} Similar code was used for the synchronized version. I will commit the tests used for this ticket to the JMH examples module. The code is sub-optimal as the Math.min is only required on the final iteration. It won't effect the benchmark as they all used this method but if the block size is a constant it will allow the JVM to unroll the main fill loop. So I may modify it to do the loop without the min and then a final fill at the end as is done in {{IntProvider.nextBytesFill}} for example. {quote}How does it impact the time before wrapping around? {quote} Given the smily face do you mean how long before the 2^1024 period repeats? Well given that the calculation was based on outputting a deviate every 0.1 nanoseconds and the data for single {{long}} generation shows an output takes about 15 nanoseconds then the 969 years will become 145,350 years. I think this generator is fine for the intended usage. > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, well_unfair_vs_fair.jpg, xor_unfair_int1.png, > xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16868021#comment-16868021 ] Gilles commented on RNG-104: Thanks. +1 to use XorShift1024SPhi +1 to synchronize on the recommended block size. I guess that if fewer values are needed than the block size, the rest is discarded (?). How does it impact the time before wrapping around? ;-) > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, well_unfair_vs_fair.jpg, xor_unfair_int1.png, > xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-vfs] jclx commented on issue #27: SFTP: Perform less strict permission checks on servers where the exec channel is disabled
jclx commented on issue #27: SFTP: Perform less strict permission checks on servers where the exec channel is disabled URL: https://github.com/apache/commons-vfs/pull/27#issuecomment-503741348 I see this was closed. I couldn't find anything else about this issue but our sftp servers disallows exec channel calls likes 'id -G' and returns this string "This service allows sftp connections only." and then fails with an exception and thus can't do any calls that use isWritable,isReadable, isExecutable. vfs sftp is unusable. It there anything in the works or an issue for this so vfs can handle this case at a future date time. Current workaround is to create our own SFTP Provider and subclass and disable exec commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-net] tadhgpearson commented on issue #41: [NET-405] Support for IPv6 in SubnetUtils
tadhgpearson commented on issue #41: [NET-405] Support for IPv6 in SubnetUtils URL: https://github.com/apache/commons-net/pull/41#issuecomment-503721982 This functionality would be super-useful! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (LANG-1466) new property "omitNullValues" ToStringBuilder
nimo mayr created LANG-1466: --- Summary: new property "omitNullValues" ToStringBuilder Key: LANG-1466 URL: https://issues.apache.org/jira/browse/LANG-1466 Project: Commons Lang Issue Type: Improvement Components: lang.builder.* Affects Versions: 3.9 Reporter: nimo mayr Actually, commons lang "{{ReflectionToStringBuilder"}} *cannot exclude null values*{{:}} {code:java} return ToStringBuilder.reflectionToString(object, ToStringStyle.MULTI_LINE_STYLE);{code} Please provide a new property "omitNullValues" to exclude null values from the "object"-string. {code:java} boolean omitNullValues = true; return ToStringBuilder.reflectionToString(object, ToStringStyle.MULTI_LINE_STYLE, omitNullValues); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MATH-1490) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867875#comment-16867875 ] Virendra Singh edited comment on MATH-1490 at 6/19/19 6:06 PM: --- The Test case given by you in *#MATH-1491* can be resolved using delta value *1e-15* but, it is not working in this case! This is the error I found: {color:#ff}_java.lang.AssertionError: expected:<68.95> but was:<68.94>_{color} {color:#33}*1e-14* should work in this case, but it's not working. While *1e-13* is working{color} was (Author: virendrasinghrp): The Test case given by you in *#MATH-1491* can be resolved * *using delta value *1e-15* but, it is not working in this case! This is the error I found: {color:#ff}_java.lang.AssertionError: expected:<68.95> but was:<68.94>_{color} {color:#33}*1e-14* should work in this case, but it's not working. While *1e-13* is working{color} > Percentile computational accuracy issue > --- > > Key: MATH-1490 > URL: https://issues.apache.org/jira/browse/MATH-1490 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (MATH-1490) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867875#comment-16867875 ] Virendra Singh edited comment on MATH-1490 at 6/19/19 6:04 PM: --- The Test case given by you in *#MATH-1491* can be resolved * *using delta value *1e-15* but, it is not working in this case! This is the error I found: {color:#ff}_java.lang.AssertionError: expected:<68.95> but was:<68.94>_{color} {color:#33}*1e-14* should work in this case, but it's not working. While *1e-13* is working{color} was (Author: virendrasinghrp): The Test case given by you in *#MATH-1491* can be resolved ** using delta value *1e-15* but, it is not working in this case! This is the error I found: {color:#FF}_java.lang.AssertionError: expected:<68.95> but was:<68.94>_{color} {color:#33}*1e-15* should work in this case also, but it's not working.{color} > Percentile computational accuracy issue > --- > > Key: MATH-1490 > URL: https://issues.apache.org/jira/browse/MATH-1490 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MATH-1490) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867875#comment-16867875 ] Virendra Singh commented on MATH-1490: -- The Test case given by you in *#MATH-1491* can be resolved ** using delta value *1e-15* but, it is not working in this case! This is the error I found: {color:#FF}_java.lang.AssertionError: expected:<68.95> but was:<68.94>_{color} {color:#33}*1e-15* should work in this case also, but it's not working.{color} > Percentile computational accuracy issue > --- > > Key: MATH-1490 > URL: https://issues.apache.org/jira/browse/MATH-1490 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MATH-1491) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867867#comment-16867867 ] Virendra Singh commented on MATH-1491: -- Hi [~lingchao], I looked into your issue, I checked it on version 2.2 & 3.6 with your Test case. I found that version 2.2 gives the Percentile value = *7.55* while the version 3.6 gives the Percentile value = *7.551*(for this particular test case). It can be solved if you add delta/tolerance *1e-15* in assertEquals(expected,actual,delta) method. > Percentile computational accuracy issue > --- > > Key: MATH-1491 > URL: https://issues.apache.org/jira/browse/MATH-1491 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > Hi, > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (MATH-1490) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virendra Singh updated MATH-1490: - Comment: was deleted (was: Hi [~lingchao], I looked into your issue, I checked it on version 2.2 & 3.6 with your Test case. I found that version 2.2 gives the Percentile value = *7.55* while the version 3.6 gives the Percentile value = *7.551*(for this particular test case). It can be solved if you add delta/tolerance *1e-15* in assertEquals(expected,actual,delta) method.) > Percentile computational accuracy issue > --- > > Key: MATH-1490 > URL: https://issues.apache.org/jira/browse/MATH-1490 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867831#comment-16867831 ] Alex D Herbert commented on RNG-104: For completeness I have added the use of the fair and unfair lock to the performance table for creation of a single int or long using various thread-safe methods: ||Type||Method||1 thread||4 threads|| |int|ThreadLocalRandom_nextInt|1.39|1.67| |int|ThreadLocalSplitMix_nextInt|4.03|33.08| |int|ThreadLocalRNG_nextInt|4.28|24.42| |int|ThreadLocalSequenceMix_nextInt|4.41|20.82| |int|AtomicInt_getAndIncrement|4.82|62.61| |int|volatileInt_increment|4.83|143.45| |int|SyncSplitMix_nextInt|6.54|69.60| |int|Random_nextInt|7.48|326.74| |int|FairLock_XorShift1024StarPhi_nextInt|15.05|42,808.57| |int|UnfairLock_XoRoShiRo128Plus_nextInt|15.15|212.06| |int|UnfairLock_XorShift1024StarPhi_nextInt|15.25|233.13| |int|FairLock_XoRoShiRo128Plus_nextInt|15.36|43,605.65| |int|Sync_XorShift1024StarPhi_nextInt|17.04|312.60| |int|Sync_XoRoShiRo128Plus_nextInt|17.21|350.46| |int|Sync_Well44497b_nextInt|18.67|494.17| |int|UnfairLock_Well44497b_nextInt|19.74|320.05| |int|FairLock_Well44497b_nextInt|20.04|43,465.64| |int|System_identityHashCode|38.02|41.17| |int|SeedFactory_createInt|54.56|821.41| |long|ThreadLocalRandom_nextLong|1.39|1.66| |long|ThreadLocalSplitMix_nextLong|4.00|26.13| |long|ThreadLocalRNG_nextLong|4.41|4.86| |long|ThreadLocalSequenceMix_nextLong|4.57|21.31| |long|AtomicLong_getAndIncrement|4.83|69.25| |long|volatileLong_increment|4.83|148.76| |long|SyncSplitMix_nextLong|6.58|87.68| |long|UnfairLock_XoRoShiRo128Plus_nextLong|14.38|196.20| |long|UnfairLock_XorShift1024StarPhi_nextLong|14.67|255.93| |long|FairLock_XorShift1024StarPhi_nextLong|15.18|43,880.36| |long|FairLock_XoRoShiRo128Plus_nextLong|15.19|42,316.21| |long|Sync_XorShift1024StarPhi_nextLong|16.77|294.09| |long|Sync_XoRoShiRo128Plus_nextLong|16.78|332.61| |long|Random_nextLong|17.07|500.79| |long|FairLock_Well44497b_nextLong|29.00|43,504.11| |long|UnfairLock_Well44497b_nextLong|29.11|342.99| |long|Sync_Well44497b_nextLong|29.31|442.95| |long|SeedFactory_createLong|63.79|777.90| |long|System_nanoTime|564.86|2,191.90| |long|System_currentTimeMillis|564.91|2,191.36| There is some inconsistency with the use of ThreadLocal<> on 4 threads. Sometimes it is almost as fast as a single thread (see ThreadLocalRNG_nextLong). At other times it is a lot slower but still among the fastest methods. Just nowhere close to ThreadLocalRandom. The use of a fair lock policy on multiple threads is very slow. On a single thread it is close to the fair lock policy. Here the use of the {{ReentrantLock}} is faster than using a synchronized block around a single generator both on a single thread and on multiple threads. Don't use System nanoTime or currentTimeMillis. They are slow and not random. One item to note is that the use of an atomic long as the state for a split mix algorithm is faster than using a ReentrantLock around a generator. So this would be a candidate for a thread-safe fast seed generator for single int or long values with minimal coding. It would have a period of only 2^64 but no start-up cost. A bit more coding would create the better performing ThreadLocal<> variants. These have a start-up cost for creation on a thread but perform better when more than a single value is required on the thread. The recommendation would be to use ThreadLocalRandom for fast local seeding which effectively does the same thing (but requires Java 1.7). This table can be updated if the SeedFactory is changed as per the recommendations above. Currently it is slow as it combines synchronisation with the System_identityHashCode method. > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, well_unfair_vs_fair.jpg, xor_unfair_int1.png, > xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single lo
[jira] [Commented] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867772#comment-16867772 ] Alex D Herbert commented on RNG-104: I have done some more analysis of seed array creation. The generator was either synchronized using a synchronized block or using an instance of {{java.util.concurrent.ReentrantLock}} with either a fair or unfair policy. For each synchronization a block of numbers was computed. h2. Locks vs synchronized The use of {{synchronized}} results in variable timing results. !well_sync_performance.jpg! Compare this to the use of the {{ReentrantLock}}: !well_unfair_performance.jpg! Here the relative time decreases in an expected fashion as the block size increases and the number of synchronisations reduces. Other timings made using the unfair version of the {{ReentrantLock}} are all consistent. Doubling the size of the seed results in doubling of timings. This is not true for the synchronized block and timings are not consistent. The inconsistencies are typically that the time is much faster than would be expected. Thus the JVM is performing some optimisation for the single thread benchmarks with synchronized blocks to allow the exclusive lock on the object to be regained efficiently. Thus we see that on a single thread the {{ReentrantLock}} is slower than synchronized: !well_lock_vs_sync1.png! But for 4 threads the {{ReentrantLock}} with an unfair policy moves ahead of the synchronized block: !well_lock_vs_sync4.jpg! The relative performance is highly variable but the lock is typically faster. A key point is the lock is comparable to the synchronized block on 1 thread when the RNG is used in blocks, and better when used on multiple threads. Note that the {{ReentrantLock}} allows setting a fair policy where it makes threads wait in line. This is comparable to the unfair policy on a single thread but slow when there is thread contention: !well_unfair_vs_fair.jpg! Only when the block size is large and the system has less synchronization events to process does the fair policy compare well to the unfair policy. Note that the unfair policy is performing similar thread selection as the synchronized block and the two are considered comparable. h1. Well44497B vs XorShift1024SPhi Here is the table of compute times for different seeds on a single thread with no synchronisation: ||Seed|Size||WELL_44497_B||XOR_SHIFT_1024_S_PHI||Relative|| |int[]|2|17.55|6.65|0.38| |int[]|4|29.25|9.84|0.34| |int[]|8|52.34|17.56|0.34| |int[]|16|98.40|31.54|0.32| |int[]|32|197.01|68.01|0.35| |int[]|64|380.84|120.60|0.32| |int[]|128|749.06|250.58|0.33| |long[]|2|41.12|8.34|0.20| |long[]|4|78.53|12.10|0.15| |long[]|8|159.10|21.21|0.13| |long[]|16|309.06|43.63|0.14| |long[]|32|610.77|83.59|0.14| |long[]|64|1,222.49|168.82|0.14| |long[]|128|2,497.24|406.34|0.16| So the XorShift1024 generator is much faster. This performance improvement is similar when done using times to create seeds on 1 or 4 threads so I will not show the results. h1. Optimal Block Size Here are the relative performance for creating a seed on a single thread using an unfair lock verses creating with no concurrency provision: !xor_unfair_int1.png! !xor_unfair_long1.png! So using a single synchronisation on each call to produce a value results in most of the time being used to do synchronisation. If we set the limit that the SeedFactory should spend 50% of the time synchronising and 50% of the time creating random values then the block size should be 4 {{long}} and 8 {{int}} to be computed in a single block (i.e. where the plot relative time is approximately 2). h1. Seed Quality This was discussed on the dev ML and this thread. I repeat here. For a XOR_SHIFT_1024_S_PHI generator the period is 2^1024. If sampled 2^30 times per second (>10 billion/sec), given 31449600 secs/year or approx 2^25, it will take 2^969 years to repeat the period. Basically it would be impossible to exhaust the period in a single execution of the application. This has the advantage that the output of the XorShift1024SPhi generator natively passes BigCrush. The Well44497B does not unless mixed with another generator: {noformat} RNG Dieharder TestU01 (BigCrush) WELL_44497_B0,0,0,0,0 2,3,2,2,2 WELL_44497_B ^ hash code0,0,0,0,0 2,2,2,4,2 WELL_44497_B ^ ThreadLocalRandom0,0,0,0,0 0,0,0,1,0 WELL_44497_B ^ SplitMix64 0,0,0,0,0 2,0,0,0,1 {noformat} Note the test failures for results of the SplitMix64 are on different tests. Only the plain Well44497B generator or the mix with the hash code systematically fail tests. h1. Conclusions Modifications can be made to the SeedFactory to improve seeding performance: * Switch to a XorShift1024SPhi generator * Avoid {{synchronized}} and use a {{ReentrantLock}} for stable scaling performa
[jira] [Updated] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex D Herbert updated RNG-104: --- Attachment: well_unfair_vs_fair.jpg > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, well_unfair_vs_fair.jpg, xor_unfair_int1.png, > xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex D Herbert updated RNG-104: --- Attachment: xor_unfair_long1.png > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, xor_unfair_int1.png, xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex D Herbert updated RNG-104: --- Attachment: xor_unfair_int1.png > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, > well_unfair_performance.jpg, xor_unfair_int1.png, xor_unfair_long1.png > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (MATH-1492) Replace usages of commons-numbers-core methods with equivalent methods from java.lang.Math
[ https://issues.apache.org/jira/browse/MATH-1492?focusedWorklogId=263110&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-263110 ] ASF GitHub Bot logged work on MATH-1492: Author: ASF GitHub Bot Created on: 19/Jun/19 15:31 Start Date: 19/Jun/19 15:31 Worklog Time Spent: 10m Work Description: Schamschi commented on pull request #109: MATH-1492: Replace usages of commons-numbers-core methods with equivalent java.lang.Math methods URL: https://github.com/apache/commons-math/pull/109 Replace usages of the following methods from org.apache.commons.numbers.core.ArithmeticUtils: addAndCheck(int, int) addAndCheck(long, long) mulAndCheck(int, int) mulAndCheck(long, long) subAndCheck(int, int) subAndCheck(long, long) With the following methods from java.lang.Math: addExact(int, int) addExact(long, long) multiplyExact(int, int) multiplyExact(long, long) subtractExact(int, int) subtractExact(long, long) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 263110) Time Spent: 10m Remaining Estimate: 0h > Replace usages of commons-numbers-core methods with equivalent methods from > java.lang.Math > -- > > Key: MATH-1492 > URL: https://issues.apache.org/jira/browse/MATH-1492 > Project: Commons Math > Issue Type: Task >Reporter: Heinrich Bohne >Priority: Minor > Fix For: 4.0 > > Time Spent: 10m > Remaining Estimate: 0h > > The usages of the following methods from > {{org.apache.commons.numbers.core.ArithmeticUtils}}: > {{addAndCheck(int, int)}} > {{addAndCheck(long, long)}} > {{mulAndCheck(int, int)}} > {{mulAndCheck(long, long)}} > {{subAndCheck(int, int)}} > {{subAndCheck(long, long)}} > Can be replaced with the following equivalent methods from {{java.lang.Math}}: > {{addExact(int, int)}} > {{addExact(long, long)}} > {{multiplyExact(int, int)}} > {{multiplyExact(long, long)}} > {{subtractExact(int, int)}} > {{subtractExact(long, long)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-math] Schamschi opened a new pull request #109: MATH-1492: Replace usages of commons-numbers-core methods with equivalent java.lang.Math methods
Schamschi opened a new pull request #109: MATH-1492: Replace usages of commons-numbers-core methods with equivalent java.lang.Math methods URL: https://github.com/apache/commons-math/pull/109 Replace usages of the following methods from org.apache.commons.numbers.core.ArithmeticUtils: addAndCheck(int, int) addAndCheck(long, long) mulAndCheck(int, int) mulAndCheck(long, long) subAndCheck(int, int) subAndCheck(long, long) With the following methods from java.lang.Math: addExact(int, int) addExact(long, long) multiplyExact(int, int) multiplyExact(long, long) subtractExact(int, int) subtractExact(long, long) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (RNG-104) SeedFactory seed creation performance analysis
[ https://issues.apache.org/jira/browse/RNG-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex D Herbert updated RNG-104: --- Attachment: well_sync_performance.jpg well_unfair_performance.jpg well_lock_vs_sync1.png well_lock_vs_sync4.jpg > SeedFactory seed creation performance analysis > -- > > Key: RNG-104 > URL: https://issues.apache.org/jira/browse/RNG-104 > Project: Commons RNG > Issue Type: Task > Components: simple >Affects Versions: 1.3 >Reporter: Alex D Herbert >Assignee: Alex D Herbert >Priority: Minor > Attachments: t1.jpg, t4.jpg, well_lock_vs_sync1.png, > well_lock_vs_sync4.jpg, well_sync_performance.jpg, well_unfair_performance.jpg > > > The SeedFactory is used to create seeds for the random generators. To ensure > thread safety this uses synchronized blocks around a single generator. The > current method only generates a single int or long per synchronisation. > Analyze the performance of this approach. The analysis will investigate > generating multiple values inside each synchronisation around the generator. > This analysis will also investigate methods to supplement the SeedFactory > with fast methods to create seeds. This will use a fast seeding method to > generate a single long value. This can be a seed for a SplitMix generator > used to create a seed of any length. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STATISTICS-11) OVERALL-TASK (not yet split): Designing Robust Class Structure and Architecture
[ https://issues.apache.org/jira/browse/STATISTICS-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Nguyen updated STATISTICS-11: - Description: *LINK TO DEVELOPMENT BRANCH:* *[https://github.com/BBenNguyenn/commons-statistics/tree/STATISTICS-8_Regression_Module/commons-statistics-regression]* +*[GSoC][STATISTICS][Regression] Architecture ImplementationSuggestions*+ Hello, I have some broad general ideas about how the regression module should be structured, as outlined in my proposal briefly with UMLs This is the current implementation inside commons-math-stat-regression: [!https://media.discordapp.net/attachments/550006517747810324/578420187460796417/unknown.png?width=400&height=295!|https://cdn.discordapp.com/attachments/550006517747810324/578420187460796417/unknown.png] *{color:#ff}GILLES SADOWSKI:{color}* {color:#ff}It seems there is/was an image here but I don't see it.{color} {color:#ff}For this kind of information, please use JIRA (and provide the link here).{color} This is my propsed idea, where the structure was partly inspired by SuanShu since it supported multiple types of regression (including logistic): [https://github.com/aaiyer/SuanShu/tree/master/src/main/java/com/numericalmethod/suanshu/stats/regression/linear] Disclaimer: I have only studied some econometrics and second year computer science in university, so I have zero professional data engineering experience, but am excited to start learning with this project. So, I don’t currently know the exact needs of data engineers in regards to this module and am learning as I go….which is why I would very much appreciate any input on the kinds of requirements data engineers would want from this regression module. *{color:#ff}GILLES SADOWSKI:{color}* {color:#ff}Basing a design on use-cases is very useful.{color} {color:#ff}You should collect a range of them (small/large datasets, in-memory/stream,{color} {color:#ff}dense/sparse) in order to figure what parts of the code can be common and{color} {color:#ff}what requires specialization.{color} >From someone who has used the current implementation or will use this new >implementation: * What would make your life easier? * What should definitely be kept? * What should be added/improved? * Any specific features or design criterions? * Any changes or radically different approaches to the following idea? *{color:#ff}GILLES SADOWSKI:{color}* {color:#ff}Good questions!{color} {color:#ff}What are your answers? ;){color} Note: OLS, GLS and Logistic regression are the first to be implemented, with focus to make architectural support for further additions. Changes will make use of new Java 8 features, specifically the Java Streams API to improve performance and readability. *{color:#ff}GILLES SADOWSKI:{color}* {color:#ff}+1{color} {color:#ff}I'd suggest to select one and start coding, without fearing that you'll{color} {color:#ff}probably have to change a lot of it as more use-cases are collected.{color} [!https://media.discordapp.net/attachments/550006517747810324/578420230850740225/unknown.png?width=219&height=300!|https://cdn.discordapp.com/attachments/550006517747810324/578420230850740225/unknown.png] *+Updates to this proposed implementation UML in my proposal:+* * “statistics-regression-reqLinearMath” will be replaced with EJML as suggested by Mr. Eric Barnhill * This will include a custom matrix class extended from EJML’s SimpleBase -> StatisticsMatrix * So if we decide to use an Apache Commons implementation of matrices later on, only this class should be changed internally. {color:#ff} *GILLES SADOWSKI:*{color} {color:#ff}Good precaution; but I doubt that we can include everything in a{color} {color:#ff}single class.{color} {color:#ff}How to best encapsulate the linear algebra (external) library is a{color} {color:#ff}subject on its own, worth its own thread: Cramming many questions{color} {color:#ff}in a single post makes it likely that some will be missed by some{color} {color:#ff}people who might later on question the chosen path. [External{color} {color:#ff}dependencies is a sensitive issue, in Commons...]{color} {color:#ff} {color} {color:#ff}Also, I remind that we need to take into account the comparative{color} {color:#ff}benchmarks which I posted recently. [Even if just to conclude that{color} {color:#ff}EJML has overwhelming advantages (which?) that make it more{color} {color:#ff}suitable than its "competitors".]{color} * Abstract classes should have interfaces above them or perhaps just be interfaces if a simpler approach is implemented (ie minimal OOP) *+Notes about this proposed implementation:+* * AbstractVariables and it’s child classes may not be necessary, ie just Estimators and Residuals classes * Or
[jira] [Commented] (STATISTICS-8) Implementation of regression libraries within common-statistics framework
[ https://issues.apache.org/jira/browse/STATISTICS-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867754#comment-16867754 ] Ben Nguyen commented on STATISTICS-8: - I can't change the description so I will put it as a comment here and in the main sub-tasks: [https://github.com/BBenNguyenn/commons-statistics/tree/STATISTICS-8_Regression_Module/commons-statistics-regression] > Implementation of regression libraries within common-statistics framework > - > > Key: STATISTICS-8 > URL: https://issues.apache.org/jira/browse/STATISTICS-8 > Project: Apache Commons Statistics > Issue Type: Task >Reporter: Eric Barnhill >Priority: Major > Attachments: PublicFinalDraft_Nguyen_Proposal_ApacheStatRegression.pdf > > > Apache commons is one of the most widely used resources by Java programmers > around the world. Data related applications are soaring and Java is one of > the most commonly used languages for data engineering. Consequently the > commons-statistics library, currently under development, is likely to find a > widespread audience. > For this project we aim to implement regression methods, arguably the most > widely used techniques in statistics and machine learning, within the Apache > commons framework, in particular within the new commons-statistics library. > The assignee will: > * Use core functionality from the regression sub-libraries of the deprecated > commons-math 4 framework as a starting point > * Create a new, standalone commons component for regression statistics, > focusing first on linear and logistic regression > * Make architectural and design decisions in the commons philosophy, that > is, lightweight standalone components easy to understand and use by a wide > range of Java developers (i.e. not a large, omnibus mathematical library with > many degrees of abstraction) > * Draw inspiration from widely used libraries in scikit-learn and R to > design an up-to-date statistics package > * Design unit testing and documentation for these libraries > Particularly challenging design decisions include how to incorporate core > matrix libraries with a minimum of dependencies and redundancies. > We see this project as potentially having a large impact on big data > applications. Java and the JVM are fundamental to popular data engineering > tools like Hadoop and Spark. Regression analyses are however often handled > downstream, on the other side of the "data fence", by tools like Python and > R. A robust and scalable pure Java regression library, easily visible and > accessible through Apache commons, can enable better integration of both > sides of this data divide by enabling many machine learning steps to be > programmed at scale on the Java side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-vfs] jclx commented on issue #30: VFS-614 MonitorInputStream should not close the stream in "read"
jclx commented on issue #30: VFS-614 MonitorInputStream should not close the stream in "read" URL: https://github.com/apache/commons-vfs/pull/30#issuecomment-503603054 @garydgregory , @boris-petrov Thanks for fixing this. I looked for a release schedule but couldn't find one. Any timeline for when 2.4 will be releases? Going to build locally for now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (MATH-1492) Replace usages of commons-numbers-core methods with equivalent methods from java.lang.Math
Heinrich Bohne created MATH-1492: Summary: Replace usages of commons-numbers-core methods with equivalent methods from java.lang.Math Key: MATH-1492 URL: https://issues.apache.org/jira/browse/MATH-1492 Project: Commons Math Issue Type: Task Reporter: Heinrich Bohne Fix For: 4.0 The usages of the following methods from {{org.apache.commons.numbers.core.ArithmeticUtils}}: {{addAndCheck(int, int)}} {{addAndCheck(long, long)}} {{mulAndCheck(int, int)}} {{mulAndCheck(long, long)}} {{subAndCheck(int, int)}} {{subAndCheck(long, long)}} Can be replaced with the following equivalent methods from {{java.lang.Math}}: {{addExact(int, int)}} {{addExact(long, long)}} {{multiplyExact(int, int)}} {{multiplyExact(long, long)}} {{subtractExact(int, int)}} {{subtractExact(long, long)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (VFS-614) MonitorInputStream should not close the stream in read()
[ https://issues.apache.org/jira/browse/VFS-614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867736#comment-16867736 ] Jochen Wiedmann commented on VFS-614: - I think, what's actually intended here, is the invocation of onClose() as an indicator of EOF. So, to ensure upwards compatibility, I'd suggest either of 1.) Rather than simply removing the read() call, replace it with onClose(). Or, even better: 2.) Add a new method onEof() and make sure, that it is invoked properly. > MonitorInputStream should not close the stream in read() > > > Key: VFS-614 > URL: https://issues.apache.org/jira/browse/VFS-614 > Project: Commons VFS > Issue Type: Bug >Affects Versions: 2.1 >Reporter: Boris Petrov >Assignee: Otto Fowler >Priority: Critical > Fix For: 1.0 > > > Check the following thread for more description: > https://mail-archives.apache.org/mod_mbox/commons-user/201606.mbox/%3C90211dd5-5954-e786-e493-30187e68007b%40profuzdigital.com%3E > And the following repo with a "demo" of the bug: > https://github.com/boris-petrov/vfs-bug -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-17) RELEVANT INFORMATION LIST
[ https://issues.apache.org/jira/browse/STATISTICS-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867725#comment-16867725 ] Gilles commented on STATISTICS-17: -- We can delete it after you've copied all relevant info to their specific page. > RELEVANT INFORMATION LIST > - > > Key: STATISTICS-17 > URL: https://issues.apache.org/jira/browse/STATISTICS-17 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Ben Nguyen >Priority: Blocker > > *+Relevant JIRA Issues+* > * OLSMultipleLinearRegression estimates different residuals with different > order of input > ** https://issues.apache.org/jira/browse/MATH-1428 > * SimpleRegression#getSlopeConfidenceInterval recalculates t distribution on > every call > ** https://issues.apache.org/jira/browse/MATH-1441 > +*Relevant Legacy Pull Requests*+ > * [https://github.com/apache/commons-math/pull/106] > ** Related JIRA: > *** Pull request for GLSMultipleLinearRegression > *** https://issues.apache.org/jira/browse/MATH-1482 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-8) Implementation of regression libraries within common-statistics framework
[ https://issues.apache.org/jira/browse/STATISTICS-8?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867721#comment-16867721 ] Gilles commented on STATISTICS-8: - Please post a link to where the code being developed. > Implementation of regression libraries within common-statistics framework > - > > Key: STATISTICS-8 > URL: https://issues.apache.org/jira/browse/STATISTICS-8 > Project: Apache Commons Statistics > Issue Type: Task >Reporter: Eric Barnhill >Priority: Major > Attachments: PublicFinalDraft_Nguyen_Proposal_ApacheStatRegression.pdf > > > Apache commons is one of the most widely used resources by Java programmers > around the world. Data related applications are soaring and Java is one of > the most commonly used languages for data engineering. Consequently the > commons-statistics library, currently under development, is likely to find a > widespread audience. > For this project we aim to implement regression methods, arguably the most > widely used techniques in statistics and machine learning, within the Apache > commons framework, in particular within the new commons-statistics library. > The assignee will: > * Use core functionality from the regression sub-libraries of the deprecated > commons-math 4 framework as a starting point > * Create a new, standalone commons component for regression statistics, > focusing first on linear and logistic regression > * Make architectural and design decisions in the commons philosophy, that > is, lightweight standalone components easy to understand and use by a wide > range of Java developers (i.e. not a large, omnibus mathematical library with > many degrees of abstraction) > * Draw inspiration from widely used libraries in scikit-learn and R to > design an up-to-date statistics package > * Design unit testing and documentation for these libraries > Particularly challenging design decisions include how to incorporate core > matrix libraries with a minimum of dependencies and redundancies. > We see this project as potentially having a large impact on big data > applications. Java and the JVM are fundamental to popular data engineering > tools like Hadoop and Spark. Regression analyses are however often handled > downstream, on the other side of the "data fence", by tools like Python and > R. A robust and scalable pure Java regression library, easily visible and > accessible through Apache commons, can enable better integration of both > sides of this data divide by enabling many machine learning steps to be > programmed at scale on the Java side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-17) RELEVANT INFORMATION LIST
[ https://issues.apache.org/jira/browse/STATISTICS-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867720#comment-16867720 ] Ben Nguyen commented on STATISTICS-17: -- I'm guessing I can't remove this ticket altogether? as if STATISTICS-17 never existed? > RELEVANT INFORMATION LIST > - > > Key: STATISTICS-17 > URL: https://issues.apache.org/jira/browse/STATISTICS-17 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Ben Nguyen >Priority: Blocker > > *+Relevant JIRA Issues+* > * OLSMultipleLinearRegression estimates different residuals with different > order of input > ** https://issues.apache.org/jira/browse/MATH-1428 > * SimpleRegression#getSlopeConfidenceInterval recalculates t distribution on > every call > ** https://issues.apache.org/jira/browse/MATH-1441 > +*Relevant Legacy Pull Requests*+ > * [https://github.com/apache/commons-math/pull/106] > ** Related JIRA: > *** Pull request for GLSMultipleLinearRegression > *** https://issues.apache.org/jira/browse/MATH-1482 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (STATISTICS-17) RELEVANT INFORMATION LIST
[ https://issues.apache.org/jira/browse/STATISTICS-17?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Nguyen closed STATISTICS-17. Resolution: Not A Problem > RELEVANT INFORMATION LIST > - > > Key: STATISTICS-17 > URL: https://issues.apache.org/jira/browse/STATISTICS-17 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Ben Nguyen >Priority: Blocker > > *+Relevant JIRA Issues+* > * OLSMultipleLinearRegression estimates different residuals with different > order of input > ** https://issues.apache.org/jira/browse/MATH-1428 > * SimpleRegression#getSlopeConfidenceInterval recalculates t distribution on > every call > ** https://issues.apache.org/jira/browse/MATH-1441 > +*Relevant Legacy Pull Requests*+ > * [https://github.com/apache/commons-math/pull/106] > ** Related JIRA: > *** Pull request for GLSMultipleLinearRegression > *** https://issues.apache.org/jira/browse/MATH-1482 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STATISTICS-17) RELEVANT INFORMATION LIST
[ https://issues.apache.org/jira/browse/STATISTICS-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867711#comment-16867711 ] Gilles commented on STATISTICS-17: -- Please open one JIRA report per issue. Link to other JIRA issues are best done using "Link" in the "More" menu. > RELEVANT INFORMATION LIST > - > > Key: STATISTICS-17 > URL: https://issues.apache.org/jira/browse/STATISTICS-17 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Ben Nguyen >Priority: Blocker > > *+Relevant JIRA Issues+* > * OLSMultipleLinearRegression estimates different residuals with different > order of input > ** https://issues.apache.org/jira/browse/MATH-1428 > * SimpleRegression#getSlopeConfidenceInterval recalculates t distribution on > every call > ** https://issues.apache.org/jira/browse/MATH-1441 > +*Relevant Legacy Pull Requests*+ > * [https://github.com/apache/commons-math/pull/106] > ** Related JIRA: > *** Pull request for GLSMultipleLinearRegression > *** https://issues.apache.org/jira/browse/MATH-1482 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MATH-1490) Percentile computational accuracy issue
[ https://issues.apache.org/jira/browse/MATH-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867589#comment-16867589 ] Virendra Singh commented on MATH-1490: -- Hi [~lingchao], I looked into your issue, I checked it on version 2.2 & 3.6 with your Test case. I found that version 2.2 gives the Percentile value = *7.55* while the version 3.6 gives the Percentile value = *7.551*(for this particular test case). It can be solved if you add delta/tolerance *1e-15* in assertEquals(expected,actual,delta) method. > Percentile computational accuracy issue > --- > > Key: MATH-1490 > URL: https://issues.apache.org/jira/browse/MATH-1490 > Project: Commons Math > Issue Type: Bug >Affects Versions: 3.4, 3.4.1, 3.5, 3.6, 3.6.1 > Environment: System: Linux testinglab 4.4.0-131-generic > #157~14.04.1-Ubuntu > Java version "1.8.0_191" > Java(TM) SE Runtime Environment (build 1.8.0_191-b12) > Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode) > > >Reporter: Lingchao Chen >Priority: Major > Labels: performance > Attachments: BugDemo.java > > > The percentile method works well on the older versions, e.g., the version > before 3.4. However, when I update commons-math to the newer version, there > produces a computational accuracy issue. There is a backward compatibility > bug behind it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STATISTICS-18) Port Percentile class from Commons-Maths
[ https://issues.apache.org/jira/browse/STATISTICS-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virendra Singh updated STATISTICS-18: - Description: Port the Percentile class from Commons-Maths in Commons-Statistics. > Port Percentile class from Commons-Maths > > > Key: STATISTICS-18 > URL: https://issues.apache.org/jira/browse/STATISTICS-18 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Virendra Singh >Priority: Blocker > > Port the Percentile class from Commons-Maths in Commons-Statistics. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STATISTICS-18) Port Percentile class from Commons-Maths
[ https://issues.apache.org/jira/browse/STATISTICS-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virendra Singh updated STATISTICS-18: - Description: (was: +*Relevant Jira issues:*+ * Percentile computational accuracy issue https://issues.apache.org/jira/browse/MATH-1491) > Port Percentile class from Commons-Maths > > > Key: STATISTICS-18 > URL: https://issues.apache.org/jira/browse/STATISTICS-18 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Virendra Singh >Priority: Blocker > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STATISTICS-18) Port Percentile class from Commons-Maths
[ https://issues.apache.org/jira/browse/STATISTICS-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Virendra Singh updated STATISTICS-18: - Summary: Port Percentile class from Commons-Maths (was: Information relevant to STATISTICS-7) > Port Percentile class from Commons-Maths > > > Key: STATISTICS-18 > URL: https://issues.apache.org/jira/browse/STATISTICS-18 > Project: Apache Commons Statistics > Issue Type: Sub-task >Reporter: Virendra Singh >Priority: Blocker > > +*Relevant Jira issues:*+ > * Percentile computational accuracy issue > https://issues.apache.org/jira/browse/MATH-1491 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (NUMBERS-100) Code in file FractionTest.java is unsatisfactory
[ https://issues.apache.org/jira/browse/NUMBERS-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heinrich Bohne closed NUMBERS-100. -- Resolution: Fixed > Code in file FractionTest.java is unsatisfactory > > > Key: NUMBERS-100 > URL: https://issues.apache.org/jira/browse/NUMBERS-100 > Project: Commons Numbers > Issue Type: Improvement > Components: fraction >Reporter: Heinrich Bohne >Priority: Trivial > Time Spent: 1h 50m > Remaining Estimate: 0h > > The following characteristics of the file {{FractionTest.java}} can be > improved: > * The second-to-last try-catch-block in the method {{testAdd()}} is a > duplicate of the preceding try-catch-block and is therefore redundant. > * In the method {{testPow()}}, the conditions {{assertFraction(9, 49, > a.pow(2))}} and {{assertFraction(49, 9, a.pow(-2))}} are tested twice each > (once in the block after {{a}}'s declaration, and a second time in the block > after {{b}}'s declaration. This is probably a typo. > * The last two assertions in the method {{testGetReducedFraction()}} pass the > parameters to the method {{Assert.assertEquals(long, long)}} in the wrong > order (the expected value should go first). > * Several methods in this class contain a number of tests that use shared > local variables but are completely independent of each other because these > local variables get assigned new values at the beginning of a test. The fact > that the scope of these local variables encompasses all those independent > tests makes the code look more confusing than necessary. > * Except for the method {{testGoldenRatio()}}, the throwing of an exception > is tested with a construct involving the swallowing of an exception, rather > than an explicit syntax. > * The helper method {{assertFraction(int, int, Fraction)}} is neglected > throughout large sections of the class in favor of > {{Assert.assertEquals(long, long)}} pairs, increasing the amount of code > duplication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Work logged] (NUMBERS-100) Code in file FractionTest.java is unsatisfactory
[ https://issues.apache.org/jira/browse/NUMBERS-100?focusedWorklogId=262962&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262962 ] ASF GitHub Bot logged work on NUMBERS-100: -- Author: ASF GitHub Bot Created on: 19/Jun/19 11:00 Start Date: 19/Jun/19 11:00 Worklog Time Spent: 10m Work Description: asfgit commented on pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 262962) Time Spent: 1h 50m (was: 1h 40m) > Code in file FractionTest.java is unsatisfactory > > > Key: NUMBERS-100 > URL: https://issues.apache.org/jira/browse/NUMBERS-100 > Project: Commons Numbers > Issue Type: Improvement > Components: fraction >Reporter: Heinrich Bohne >Priority: Trivial > Time Spent: 1h 50m > Remaining Estimate: 0h > > The following characteristics of the file {{FractionTest.java}} can be > improved: > * The second-to-last try-catch-block in the method {{testAdd()}} is a > duplicate of the preceding try-catch-block and is therefore redundant. > * In the method {{testPow()}}, the conditions {{assertFraction(9, 49, > a.pow(2))}} and {{assertFraction(49, 9, a.pow(-2))}} are tested twice each > (once in the block after {{a}}'s declaration, and a second time in the block > after {{b}}'s declaration. This is probably a typo. > * The last two assertions in the method {{testGetReducedFraction()}} pass the > parameters to the method {{Assert.assertEquals(long, long)}} in the wrong > order (the expected value should go first). > * Several methods in this class contain a number of tests that use shared > local variables but are completely independent of each other because these > local variables get assigned new values at the beginning of a test. The fact > that the scope of these local variables encompasses all those independent > tests makes the code look more confusing than necessary. > * Except for the method {{testGoldenRatio()}}, the throwing of an exception > is tested with a construct involving the swallowing of an exception, rather > than an explicit syntax. > * The helper method {{assertFraction(int, int, Fraction)}} is neglected > throughout large sections of the class in favor of > {{Assert.assertEquals(long, long)}} pairs, increasing the amount of code > duplication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-numbers] asfgit merged pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate
asfgit merged pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-lang] Kakarot-SSJ4 closed pull request #432: Annotating base classes
Kakarot-SSJ4 closed pull request #432: Annotating base classes URL: https://github.com/apache/commons-lang/pull/432 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [commons-lang] Kakarot-SSJ4 opened a new pull request #432: Annotating base classes
Kakarot-SSJ4 opened a new pull request #432: Annotating base classes URL: https://github.com/apache/commons-lang/pull/432 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (NUMBERS-100) Code in file FractionTest.java is unsatisfactory
[ https://issues.apache.org/jira/browse/NUMBERS-100?focusedWorklogId=262953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262953 ] ASF GitHub Bot logged work on NUMBERS-100: -- Author: ASF GitHub Bot Created on: 19/Jun/19 10:49 Start Date: 19/Jun/19 10:49 Worklog Time Spent: 10m Work Description: coveralls commented on issue #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54#issuecomment-503509586 [![Coverage Status](https://coveralls.io/builds/24081339/badge)](https://coveralls.io/builds/24081339) Coverage remained the same at 94.381% when pulling **8359d0af38ae1384e1c7276c82389f2106e0333f on Schamschi:NUMBERS-100** into **44cdefe8ca22cc3151e32eb956324ed535f5a2e7 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 262953) Time Spent: 1h 40m (was: 1.5h) > Code in file FractionTest.java is unsatisfactory > > > Key: NUMBERS-100 > URL: https://issues.apache.org/jira/browse/NUMBERS-100 > Project: Commons Numbers > Issue Type: Improvement > Components: fraction >Reporter: Heinrich Bohne >Priority: Trivial > Time Spent: 1h 40m > Remaining Estimate: 0h > > The following characteristics of the file {{FractionTest.java}} can be > improved: > * The second-to-last try-catch-block in the method {{testAdd()}} is a > duplicate of the preceding try-catch-block and is therefore redundant. > * In the method {{testPow()}}, the conditions {{assertFraction(9, 49, > a.pow(2))}} and {{assertFraction(49, 9, a.pow(-2))}} are tested twice each > (once in the block after {{a}}'s declaration, and a second time in the block > after {{b}}'s declaration. This is probably a typo. > * The last two assertions in the method {{testGetReducedFraction()}} pass the > parameters to the method {{Assert.assertEquals(long, long)}} in the wrong > order (the expected value should go first). > * Several methods in this class contain a number of tests that use shared > local variables but are completely independent of each other because these > local variables get assigned new values at the beginning of a test. The fact > that the scope of these local variables encompasses all those independent > tests makes the code look more confusing than necessary. > * Except for the method {{testGoldenRatio()}}, the throwing of an exception > is tested with a construct involving the swallowing of an exception, rather > than an explicit syntax. > * The helper method {{assertFraction(int, int, Fraction)}} is neglected > throughout large sections of the class in favor of > {{Assert.assertEquals(long, long)}} pairs, increasing the amount of code > duplication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-numbers] coveralls commented on issue #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate
coveralls commented on issue #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54#issuecomment-503509586 [![Coverage Status](https://coveralls.io/builds/24081339/badge)](https://coveralls.io/builds/24081339) Coverage remained the same at 94.381% when pulling **8359d0af38ae1384e1c7276c82389f2106e0333f on Schamschi:NUMBERS-100** into **44cdefe8ca22cc3151e32eb956324ed535f5a2e7 on apache:master**. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (NUMBERS-116) Remove redundant methods in ArithmeticUtils
[ https://issues.apache.org/jira/browse/NUMBERS-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867491#comment-16867491 ] Gilles commented on NUMBERS-116: Could you please make a PR (and JIRA report) against the ["master" branch of "Commons Math"|https://gitbox.apache.org/repos/asf?p=commons-math.git] to replace occurrences of these methods there? Thanks. > Remove redundant methods in ArithmeticUtils > --- > > Key: NUMBERS-116 > URL: https://issues.apache.org/jira/browse/NUMBERS-116 > Project: Commons Numbers > Issue Type: Improvement > Components: core >Reporter: Heinrich Bohne >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > As has been > [discussed|http://mail-archives.apache.org/mod_mbox/commons-dev/201906.mbox/%3C940f9ff0-0b25-cd31-ddb3-a95ca777ba06%40gmx.at%3E] > on the developers' mailing list, the following methods from the class > {{ArithmeticUtils}} can be removed: > {{addAndCheck(int, int)}} > {{addAndCheck(long, long)}} > {{mulAndCheck(int, int)}} > {{mulAndCheck(long, long)}} > {{subAndCheck(int, int)}} > {{subAndCheck(long, long)}} > And their usages replaced with the following equivalent methods from > {{java.lang.Math}}: > {{addExact(int, int)}} > {{addExact(long, long)}} > {{multiplyExact(int, int)}} > {{multiplyExact(long, long)}} > {{subtractExact(int, int)}} > {{subtractExact(long, long)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-rng] asfgit merged pull request #43: Upgrade parent to 48.
asfgit merged pull request #43: Upgrade parent to 48. URL: https://github.com/apache/commons-rng/pull/43 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Work logged] (NUMBERS-100) Code in file FractionTest.java is unsatisfactory
[ https://issues.apache.org/jira/browse/NUMBERS-100?focusedWorklogId=262927&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-262927 ] ASF GitHub Bot logged work on NUMBERS-100: -- Author: ASF GitHub Bot Created on: 19/Jun/19 10:01 Start Date: 19/Jun/19 10:01 Worklog Time Spent: 10m Work Description: Schamschi commented on pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54 This PR does _not_ conflict with [PR #52](https://github.com/apache/commons-numbers/pull/52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 262927) Time Spent: 1.5h (was: 1h 20m) > Code in file FractionTest.java is unsatisfactory > > > Key: NUMBERS-100 > URL: https://issues.apache.org/jira/browse/NUMBERS-100 > Project: Commons Numbers > Issue Type: Improvement > Components: fraction >Reporter: Heinrich Bohne >Priority: Trivial > Time Spent: 1.5h > Remaining Estimate: 0h > > The following characteristics of the file {{FractionTest.java}} can be > improved: > * The second-to-last try-catch-block in the method {{testAdd()}} is a > duplicate of the preceding try-catch-block and is therefore redundant. > * In the method {{testPow()}}, the conditions {{assertFraction(9, 49, > a.pow(2))}} and {{assertFraction(49, 9, a.pow(-2))}} are tested twice each > (once in the block after {{a}}'s declaration, and a second time in the block > after {{b}}'s declaration. This is probably a typo. > * The last two assertions in the method {{testGetReducedFraction()}} pass the > parameters to the method {{Assert.assertEquals(long, long)}} in the wrong > order (the expected value should go first). > * Several methods in this class contain a number of tests that use shared > local variables but are completely independent of each other because these > local variables get assigned new values at the beginning of a test. The fact > that the scope of these local variables encompasses all those independent > tests makes the code look more confusing than necessary. > * Except for the method {{testGoldenRatio()}}, the throwing of an exception > is tested with a construct involving the swallowing of an exception, rather > than an explicit syntax. > * The helper method {{assertFraction(int, int, Fraction)}} is neglected > throughout large sections of the class in favor of > {{Assert.assertEquals(long, long)}} pairs, increasing the amount of code > duplication. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [commons-numbers] Schamschi opened a new pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate
Schamschi opened a new pull request #54: [NUMBERS-100] class FractionTest: replace try-catch with assertThrows where appropriate URL: https://github.com/apache/commons-numbers/pull/54 This PR does _not_ conflict with [PR #52](https://github.com/apache/commons-numbers/pull/52). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Comment Edited] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867455#comment-16867455 ] Daniel Cattlin edited comment on CSV-227 at 6/19/19 9:53 AM: - With the default "QuoteMode.MINIMAL" I've seen some pretty weird behaviour too. It's pretty easy to reproduce if you use 2 columns of data that contain the same values and do a side by side comparison. Here are some example data that was output by the CSV writer with unexpected quoting: Notice that any row that starts with a unicode character gets the first field in that row quoted but not the second field which is the same - this includes the escape character, which I find a bit odd. I also checked any fields with the delimiter are quoted just fine. I saw a similar question on Stack Overflow [https://stackoverflow.com/questions/36663273/unexpected-quoting-in-apache-commons-csv] {code:java} "[QElmqgucZ",[QElmqgucZ "`K^bPRa\Xm",`K^bPRa\Xm NJ[\LWwY`Z,NJ[\LWwY`Z c[n`zOk]qv,c[n`zOk]qv y[KIphm]Bk,y[KIphm]Bk "\rin\toDOP",\rin\toDOP McLbuXeP]a,McLbuXeP]a "\x`U^BHnVj",\x`U^BHnVj "_\MzHJA]RO",_\MzHJA]RO XslXnTQOEc,XslXnTQOEc "_UHlnX\hNu",_UHlnX\hNu ObGYlN_`g`,ObGYlN_`g` "[FazYv\vtd",[FazYv\vtd{code} I noticed it may have been fixed for negative numbers? https://issues.apache.org/jira/browse/CSV-171 was (Author: danielcattlin): With the default "QuoteMode.MINIMAL" I've seen some pretty weird behaviour too. It's pretty easy to reproduce if you use 2 columns of data that contain the same values and do a side by side comparison. Here are some example data that was output by the CSV writer with unexpected quoting: Notice that any row that starts with a unicode character gets the first field in that row quoted but not the second field which is the same - this includes the escape character, which I find a bit odd. I also checked any fields with the delimiter are quoted just fine. I saw a similar question on Stack Overflow [https://stackoverflow.com/questions/36663273/unexpected-quoting-in-apache-commons-csv] {code:java} "[QElmqgucZ",[QElmqgucZ "`K^bPRa\Xm",`K^bPRa\Xm NJ[\LWwY`Z,NJ[\LWwY`Z c[n`zOk]qv,c[n`zOk]qv y[KIphm]Bk,y[KIphm]Bk "\rin\toDOP",\rin\toDOP McLbuXeP]a,McLbuXeP]a "\x`U^BHnVj",\x`U^BHnVj "_\MzHJA]RO",_\MzHJA]RO XslXnTQOEc,XslXnTQOEc "-UHlnX\hNu",-UHlnX\hNu ObGYlN_`g`,ObGYlN_`g` "[FazYv\vtd",[FazYv\vtd{code} > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (CSV-227) first column always quoting when multilingual language, when not on second column
[ https://issues.apache.org/jira/browse/CSV-227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16867455#comment-16867455 ] Daniel Cattlin commented on CSV-227: With the default "QuoteMode.MINIMAL" I've seen some pretty weird behaviour too. It's pretty easy to reproduce if you use 2 columns of data that contain the same values and do a side by side comparison. Here are some example data that was output by the CSV writer with unexpected quoting: Notice that any row that starts with a unicode character gets the first field in that row quoted but not the second field which is the same - this includes the escape character, which I find a bit odd. I also checked any fields with the delimiter are quoted just fine. I saw a similar question on Stack Overflow [https://stackoverflow.com/questions/36663273/unexpected-quoting-in-apache-commons-csv] {code:java} "[QElmqgucZ",[QElmqgucZ "`K^bPRa\Xm",`K^bPRa\Xm NJ[\LWwY`Z,NJ[\LWwY`Z c[n`zOk]qv,c[n`zOk]qv y[KIphm]Bk,y[KIphm]Bk "\rin\toDOP",\rin\toDOP McLbuXeP]a,McLbuXeP]a "\x`U^BHnVj",\x`U^BHnVj "_\MzHJA]RO",_\MzHJA]RO XslXnTQOEc,XslXnTQOEc "-UHlnX\hNu",-UHlnX\hNu ObGYlN_`g`,ObGYlN_`g` "[FazYv\vtd",[FazYv\vtd{code} > first column always quoting when multilingual language, when not on second > column > - > > Key: CSV-227 > URL: https://issues.apache.org/jira/browse/CSV-227 > Project: Commons CSV > Issue Type: Bug > Components: Parser >Affects Versions: 1.5 >Reporter: Jisun, Shin >Priority: Major > > when including multilingual character (utf-8 encoding), > CSVPrinter always quote only first column, not other columns. > > {code:java} > // example code > CSVFormat format = CSVFormat.DEFAULT.withQuoteMode(QuoteMode.MINIMAL); > CSVPrinter printer = new CSVPrinter(System.out, format); > List temp = new ArrayList(); > temp.add(new String[] { "ㅁㅎㄷㄹ", "ㅁㅎㄷㄹ", "", "test2" }); > temp.add(new String[] { "한글3", "hello3", "3한글3", "test3" }); > temp.add(new String[] { "", "hello4", "", "test4" }); > for (String[] temp1 : temp) { > printer.printRecord(temp1); > } > printer.close(); > {code} > > result => > "ㅁㅎㄷㄹ",ㅁㅎㄷㄹ,,test2 > "한글3",hello3,3한글3,test3 > "",hello4,,test4 > > i found the code. > multilingual charaters are out of 0x7E. first record and multilinguage > always print quotes. > > {code:java} > // CSVFormat.class > ... > 1173: char c = value.charAt(pos); > 1174: > 1175: // RFC4180 (https://tools.ietf.org/html/rfc4180) TEXTDATA = %x20-21 / > %x23-2B / %x2D-7E > 1176: if (newRecord && (c < 0x20 || c > 0x21 && c < 0x23 || c > 0x2B && c < > 0x2D || c > 0x7E)) { > 1177: quote = true; > 1178: } else if (c <= COMMENT) { > ...{code} > > would you fix this bug? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (CODEC-166) Base64 could be faster
[ https://issues.apache.org/jira/browse/CODEC-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jochen Wiedmann reopened CODEC-166: --- Assignee: Jochen Wiedmann (was: Julius Davies) > Base64 could be faster > -- > > Key: CODEC-166 > URL: https://issues.apache.org/jira/browse/CODEC-166 > Project: Commons Codec > Issue Type: Wish >Affects Versions: 1.7 >Reporter: Julius Davies >Assignee: Jochen Wiedmann >Priority: Major > Fix For: 2.0 > > Attachments: CODEC-166.patch, CODEC-166_speed.patch, base64bench.zip > > > Our Base64 consistently performs 3 times slower compared to MiGBase64 and > iHarder in the byte[] and String encode() methods. > We are pretty good on decode(), though a little slower (approx. 33% slower) > than MiGBase64. > We always win in the Streaming methods (MiGBase64 doesn't do streaming). > Yay! :-) :-) :-) > I put together a benchmark. Here's a typical run: > {noformat} > LARGE DATA new byte[12345] > iHarder... > encode 486.0 MB/sdecode 158.0 MB/s > encode 491.0 MB/sdecode 148.0 MB/s > MiGBase64... > encode 499.0 MB/sdecode 222.0 MB/s > encode 493.0 MB/sdecode 226.0 MB/s > Apache Commons Codec... > encode 142.0 MB/sdecode 146.0 MB/s > encode 138.0 MB/sdecode 150.0 MB/s > {noformat} > I believe the main approach we can consider to improve performance is to > avoid array copies at all costs. MiGBase64 even counts the number of valid > Base64 characters ahead of time on decode() to precalculate the result's size > and avoid any array copying! > I suspect this will mean writing out separate execution paths for the String > and byte[] methods, and keeping them out of the streaming logic, since the > streaming logic is founded on array copy. > Unfortunately this means we will diminish internal reuse of the streaming > implementation, but I think it's the only way to improve performance, if we > want to. -- This message was sent by Atlassian JIRA (v7.6.3#76005)