[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449841#comment-16449841
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

gszadovszky opened a new pull request #468: PARQUET-1246: Ignore float/double 
statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/468
 
 
   Because of the ambigous sorting order of float/double the following changes 
made at the reading path of the related statistics:
   - Ignoring statistics in case of it contains a NaN value.
   - Using -0.0 as min value and +0.0 as max value independently from which 0.0 
value was saved in the statistics.
   
   Author: Gabor Szadovszky 
   
   Closes #461 from gszadovszky/PARQUET-1246 and squashes the following commits:
   
   20e9332 [Gabor Szadovszky] PARQUET-1246: Changes according to zi's comments
   3447938 [Gabor Szadovszky] PARQUET-1246: Ignore float/double statistics in 
case of NaN
   
   This change is based on 0a86429939075984edce5e3b8195dfb7f9e3ab6b but is not 
a clean cherry-pick.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16404817#comment-16404817
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi closed pull request #461: PARQUET-1246: Ignore float/double statistics 
in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 
b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
index a087c5f70..6888ad61b 100644
--- 
a/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
+++ 
b/parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
@@ -73,6 +73,72 @@ public Builder withNumNulls(long numNulls) {
 }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+public FloatBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+}
+
+@Override
+public Statistics build() {
+  FloatStatistics stats = (FloatStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Float min = stats.genericGetMin();
+Float max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0f, 0.0f);
+  ((Statistics) stats).hasNonNullValue = false;
+} else {
+  // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+  if (Float.compare(min, 0.0f) == 0) {
+min = -0.0f;
+stats.setMinMax(min, max);
+  }
+  if (Float.compare(max, -0.0f) == 0) {
+max = 0.0f;
+stats.setMinMax(min, max);
+  }
+}
+  }
+  return stats;
+}
+  }
+
+  // Builder for DOUBLE type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class DoubleBuilder extends Builder {
+public DoubleBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.DOUBLE;
+}
+
+@Override
+public Statistics build() {
+  DoubleStatistics stats = (DoubleStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Double min = stats.genericGetMin();
+Double max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0, 0.0);
+  ((Statistics) stats).hasNonNullValue = false;
+} else {
+  // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+  if (Double.compare(min, 0.0) == 0) {
+min = -0.0;
+stats.setMinMax(min, max);
+  }
+  if (Double.compare(max, -0.0) == 0) {
+max = 0.0;
+stats.setMinMax(min, max);
+  }
+}
+  }
+  return stats;
+}
+  }
+
   private final PrimitiveType type;
   private final PrimitiveComparator comparator;
   private boolean hasNonNullValue;
@@ -154,8 +220,15 @@ public static Statistics 
getStatsBasedOnType(PrimitiveTypeName type) {
*  type of the column
* @return builder to create new statistics object
*/
-  public static Builder getBuilder(PrimitiveType type) {
-return new Builder(type);
+  public static Builder getBuilderForReading(PrimitiveType type) {
+switch (type.getPrimitiveTypeName()) {
+  case FLOAT:
+return new FloatBuilder(type);
+  case DOUBLE:
+return new DoubleBuilder(type);
+  default:
+return new Builder(type);
+}
   }
 
   /**
@@ -266,7 +339,7 @@ public void mergeStatistics(Statistics stats) {
* Abstract method to set min and max values from byte arrays.
* @param minBytes byte array to set the min value to
* @param maxBytes byte array to set the max value to
-   * @deprecated will be removed in 2.0.0. Use {@link 
#getBuilder(PrimitiveType)} instead.
+   * @deprecated will be removed in 2.0.0. Use {@link 
#getBuilderForReading(PrimitiveType)} instead.
*/
   @Deprecated
   abstract public void setMinMaxFromBytes(byte[] minBytes, byte[] maxBytes);
@@ -401,7 +474,7 @@ public long getNumNulls() {
*
* @param nulls
*  null count to set the count to
-   * @deprecated will be removed in 2.0.0. Use {@link 

[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398633#comment-16398633
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174466575
 
 

 ##
 File path: 
parquet-column/src/test/java/org/apache/parquet/column/statistics/TestStatistics.java
 ##
 @@ -637,4 +648,106 @@ private void testMergingStringStats() {
 assertEquals(stats.getMax(), Binary.fromString(""));
 assertEquals(stats.getMin(), Binary.fromString(""));
   }
+
+  @Test
+  public void testBuilder() {
+testBuilder(Types.required(BOOLEAN).named("test_boolean"), false, new 
byte[] { 0 }, true, new byte[] { 1 });
+testBuilder(Types.required(INT32).named("test_int32"), -42, 
intToBytes(-42), 42, intToBytes(42));
+testBuilder(Types.required(INT64).named("test_int64"), -42l, 
longToBytes(-42), 42l, longToBytes(42));
+testBuilder(Types.required(FLOAT).named("test_float"), -42.0f, 
intToBytes(floatToIntBits(-42.0f)), 42.0f,
+intToBytes(floatToIntBits(42.0f)));
+testBuilder(Types.required(DOUBLE).named("test_double"), -42.0, 
longToBytes(doubleToLongBits(-42.0)), 42.0,
+longToBytes(Double.doubleToLongBits(42.0f)));
+
+byte[] min = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 };
+byte[] max = { 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 };
+testBuilder(Types.required(INT96).named("test_int96"), 
Binary.fromConstantByteArray(min), min,
+Binary.fromConstantByteArray(max), max);
+
testBuilder(Types.required(FIXED_LEN_BYTE_ARRAY).length(12).named("test_fixed"),
 Binary.fromConstantByteArray(min),
+min,
+Binary.fromConstantByteArray(max), max);
+testBuilder(Types.required(BINARY).named("test_binary"), 
Binary.fromConstantByteArray(min), min,
+Binary.fromConstantByteArray(max), max);
+  }
+
+  private void testBuilder(PrimitiveType type, Object min, byte[] minBytes, 
Object max, byte[] maxBytes) {
+Statistics.Builder builder = Statistics.getBuilderForReading(type);
+Statistics stats = builder.build();
+assertTrue(stats.isEmpty());
+assertFalse(stats.isNumNullsSet());
+assertFalse(stats.hasNonNullValue());
+
+builder = Statistics.getBuilderForReading(type);
+stats = builder.withNumNulls(0).withMin(minBytes).build();
+assertFalse(stats.isEmpty());
+assertTrue(stats.isNumNullsSet());
+assertFalse(stats.hasNonNullValue());
+assertEquals(0, stats.getNumNulls());
+
+builder = Statistics.getBuilderForReading(type);
+stats = builder.withNumNulls(11).withMax(maxBytes).build();
+assertFalse(stats.isEmpty());
+assertTrue(stats.isNumNullsSet());
+assertFalse(stats.hasNonNullValue());
+assertEquals(11, stats.getNumNulls());
+
+builder = Statistics.getBuilderForReading(type);
+stats = 
builder.withNumNulls(42).withMin(minBytes).withMax(maxBytes).build();
+assertFalse(stats.isEmpty());
+assertTrue(stats.isNumNullsSet());
+assertTrue(stats.hasNonNullValue());
+assertEquals(42, stats.getNumNulls());
+assertEquals(min, stats.genericGetMin());
+assertEquals(max, stats.genericGetMax());
+  }
+
+  @Test
 
 Review comment:
   Please add test case for -0 and +0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398630#comment-16398630
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174464685
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 ##
 @@ -73,6 +73,70 @@ public Builder withNumNulls(long numNulls) {
 }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+public FloatBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+}
+
+@Override
+public Statistics build() {
+  FloatStatistics stats = (FloatStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Float min = stats.genericGetMin();
+Float max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0f, 0.0f);
+  ((Statistics) stats).hasNonNullValue = false;
+} else {
+  // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+  if (min == 0.0f) {
 
 Review comment:
   +0 and -0 are equal according to ==. Please use Float.compareTo() instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398631#comment-16398631
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174465226
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 ##
 @@ -73,6 +73,70 @@ public Builder withNumNulls(long numNulls) {
 }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+public FloatBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+}
+
+@Override
+public Statistics build() {
+  FloatStatistics stats = (FloatStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Float min = stats.genericGetMin();
+Float max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0f, 0.0f);
+  ((Statistics) stats).hasNonNullValue = false;
+} else {
+  // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+  if (min == 0.0f) {
+stats.setMinMax(-0.0f, max);
+min = -0.0f;
 
 Review comment:
   (nit) In my opinion,
   ```
   min = -0.0f;
   stats.setMinMax(min, max);
   ```
   would be cleaner. Similar for the max case (although max is not used any 
more in this function).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398632#comment-16398632
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174465693
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 ##
 @@ -73,6 +73,70 @@ public Builder withNumNulls(long numNulls) {
 }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+public FloatBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+}
+
+@Override
+public Statistics build() {
+  FloatStatistics stats = (FloatStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Float min = stats.genericGetMin();
+Float max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0f, 0.0f);
+  ((Statistics) stats).hasNonNullValue = false;
+} else {
+  // Updating min to -0.0 and max to +0.0 to ensure that no 0.0 values 
would be skipped
+  if (min == 0.0f) {
+stats.setMinMax(-0.0f, max);
+min = -0.0f;
+  }
+  if (max == -0.0f) {
+stats.setMinMax(min, 0.0f);
+  }
+}
+  }
+  return stats;
+}
+  }
+
+  // Builder for DOUBLE type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class DoubleBuilder extends Builder {
+public DoubleBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.DOUBLE;
+}
+
+@Override
+public Statistics build() {
 
 Review comment:
   Same comments as for the float case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398629#comment-16398629
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

zivanfi commented on a change in pull request #461: PARQUET-1246: Ignore 
float/double statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461#discussion_r174463317
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/column/statistics/Statistics.java
 ##
 @@ -73,6 +73,70 @@ public Builder withNumNulls(long numNulls) {
 }
   }
 
+  // Builder for FLOAT type to handle special cases of min/max values like 
NaN, -0.0, and 0.0
+  private static class FloatBuilder extends Builder {
+public FloatBuilder(PrimitiveType type) {
+  super(type);
+  assert type.getPrimitiveTypeName() == PrimitiveTypeName.FLOAT;
+}
+
+@Override
+public Statistics build() {
+  FloatStatistics stats = (FloatStatistics) super.build();
+  if (stats.hasNonNullValue()) {
+Float min = stats.genericGetMin();
+Float max = stats.genericGetMax();
+// Drop min/max values in case of NaN as the sorting order of values 
is undefined for this case
+if (min.isNaN() || max.isNaN()) {
+  stats.setMinMax(0.0f, 0.0f);
 
 Review comment:
   This seems unnecessary. On the other hand, this is the default value when 
it's unset, so I'm fine with leaving this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1246) Ignore float/double statistics in case of NaN

2018-03-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16397217#comment-16397217
 ] 

ASF GitHub Bot commented on PARQUET-1246:
-

gszadovszky opened a new pull request #461: PARQUET-1246: Ignore float/double 
statistics in case of NaN
URL: https://github.com/apache/parquet-mr/pull/461
 
 
   Because of the ambigous sorting order of float/double the following changes 
made at the reading path of the related statistics:
   - Ignoring statistics in case of it contains a NaN value.
   - Using -0.0 as min value and +0.0 as max value independently from which 0.0 
value was saved in the statistics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Ignore float/double statistics in case of NaN
> -
>
> Key: PARQUET-1246
> URL: https://issues.apache.org/jira/browse/PARQUET-1246
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.8.1
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: 1.10.0
>
>
> The sorting order of the floating point values are not properly specified, 
> therefore NaN values can cause skipping valid values when filtering. See 
> PARQUET-1222 for more info.
> This issue is for ignoring statistics for float/double if it contains NaN to 
> prevent data loss at the read path when filtering.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)