[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-07-06 Thread Attila Jeges (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17375753#comment-17375753
 ] 

Attila Jeges commented on IMPALA-10627:
---

https://gerrit.cloudera.org/#/c/17654/

> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-07-20 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17384539#comment-17384539
 ] 

ASF subversion and git services commented on IMPALA-10627:
--

Commit fabe994d1fb011afb88d1f0f5bf078113775c9db in impala's branch 
refs/heads/master from Attila Jeges
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=fabe994 ]

IMPALA-10627: Use standard parquet-related Iceberg table properties

This patch adds support for the following standard Iceberg properties:

write.parquet.compression-codec:
  Parquet compression codec. Supported values are: NONE, GZIP, SNAPPY
  (default value), LZ4, ZSTD. The table property will be ignored if
  COMPRESSION_CODEC query option is set.

write.parquet.compression-level:
  Parquet compression level. Used with ZSTD compression only.
  Supported range is [1, 22]. Default value is 3. The table property
  will be ignored if COMPRESSION_CODEC query option is set.

write.parquet.row-group-size-bytes :
  Parquet row group size in bytes. Supported range is [8388608,
  2146435072] (8MB - 2047MB). The table property will be ignored if
  PARQUET_FILE_SIZE query option is set.
  If neither the table property nor the PARQUET_FILE_SIZE query option
  is set, the way Impala calculates row group size will remain
  unchanged.

write.parquet.page-size-bytes:
  Parquet page size in bytes. Used for PLAIN encoding. Supported range
  is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates page size
  will remain unchanged.

write.parquet.dict-size-bytes:
  Parquet dictionary page size in bytes. Used for dictionary encoding.
  Supported range is [65536, 1073741824] (64KB - 1GB).
  If the table property is unset, the way Impala calculates dictionary
  page size will remain unchanged.

This patch also renames 'iceberg.file_format' table property to
'write.format.default' which is the standard Iceberg name for the
table property.

Change-Id: I3b8aa9a52c13c41b48310d2f7c9c7426e1ff5f23
Reviewed-on: http://gerrit.cloudera.org:8080/17654
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-07-23 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386461#comment-17386461
 ] 

ASF subversion and git services commented on IMPALA-10627:
--

Commit 0061bd3433db5b72b6c6bf2f6646255272976cda in impala's branch 
refs/heads/master from Attila Jeges
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0061bd3 ]

IMPALA-10820: Fix calculating default block size for parquest files

This patch fixes a bug introduced in IMPALA-10627. Because of the bug
the wrong default block size was used for parquet files which broke
TestInsertWideTable.test_insert_wide_table e2e test.

Testing:
- Run test_insert_wide_table with exhaustive strategy.

Change-Id: Iac8c6dd80dfe84cb7b3d2106713eae87ce923934
Reviewed-on: http://gerrit.cloudera.org:8080/17719
Reviewed-by: Zoltan Borok-Nagy 
Tested-by: Impala Public Jenkins 


> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-10-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423859#comment-17423859
 ] 

ASF subversion and git services commented on IMPALA-10627:
--

Commit d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2f866f ]

IMPALA-10935: Impala crashes on old Iceberg table property

With IMPALA-10627 we switched to use standard Iceberg table
properties: https://iceberg.apache.org/configuration/

E.g. we switched from 'iceberg.file_format' to 'write.format.default'.
For backward compatibility we also support 'iceberg.file_format'. Though
the support is not perfect as it causes a crash in some cases.

Impala crashes when the following conditions met:
* local catalog mode is being used
* Iceberg table is being queried
* the data file format is ORC
* 'iceberg.file_format' is set instead of 'write.format.default' table
  property
* Query is "select count(*) from t;"

Impala wrongly assumes that PARQUET is being used and tries to apply the
count star optimization. It is not implemented for the ORC scanner and
causes it to crash.

This patch fixes the wrong assumption. Also it fixes the HdfsOrcScanner,
so it won't crash in release mode but raise an error.

This patch also enables UNSETting the file format table property for
Iceberg tables. This table property was already enabled for
modifications (changing the value via SET TBLPROPERTIES).

Testing:
 * added e2e test for the above conditions

Change-Id: Iafd9baef1c124d7356a14ba24c571567629a5e50
Reviewed-on: http://gerrit.cloudera.org:8080/17877
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org