Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Simhadri G
Please ensure hive.stats.autogather  is enabled as well.

On Fri, Nov 10, 2023, 2:57 PM Denys Kuzmenko  wrote:

> `hive.iceberg.stats.source` controls where the stats should be sourced
> from. When it's set to iceberg (default), we should go directly to iceberg
> and bypass HMS.
>


Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-10 Thread Denys Kuzmenko
`hive.iceberg.stats.source` controls where the stats should be sourced from. 
When it's set to iceberg (default), we should go directly to iceberg and bypass 
HMS. 


Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-09 Thread Butao Zhang
Can you please check this property? We need ensure it is true.
set hive.compute.query.using.stats=true;


In addition, it looks like the table created by spark has lots of data. Can you 
create a new table and insert into several values by spark, and then create & 
count(*) this  location_based_table table in hive. Does it also launch the tez 
task to scan table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 15:50 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
STEP1:
CREATE TABLE USING SPARK:
CREATE TABLE IF NOT EXISTS test.dwd.test_trade_table(
  `uni_order_id` string,
  `data_from` bigint,
  `partner` string,
  `plat_code` string,
  `order_id` string,
  `uni_shop_id` string,
  `uni_id` string,
  `guide_id` string,
  `shop_id` string,
  `plat_account` string,
  `total_fee` double,
  `item_discount_fee` 
double,
  `trade_discount_fee` 
double,
  `adjust_fee` double,
  `post_fee` double,
  `discount_rate` 
double,
  `payment_no_postfee` 
double,
  `payment` double,
  `pay_time` string,
  `product_num` bigint,
  `order_status` string,
  `is_refund` string,
  `refund_fee` double,
  `insert_time` string,
  `created` string,
  `endtime` string,
  `modified` string,
  `trade_type` string,
  `receiver_name` 
string,
  `receiver_country` 
string,
  `receiver_state` 
string,
  `receiver_city` 
string,
  `receiver_district` 
string,
  `receiver_town` 
string,
  `receiver_address` 
string,
  `receiver_mobile` 
string,
  `trade_source` string,
  `delivery_type` 
string,
  `consign_time` string,
  `orders_num` bigint,
  `is_presale` bigint,
  `presale_status` 
string,
  `first_fee_paytime` 
string,
  `last_fee_paytime` 
string,
  `first_paid_fee` 
double,
  `tenant` string,
  `tidb_modified` 
string,
  `step_paid_fee` 
double,
  `seller_flag` string,
  `is_used_store_card` 
BIGINT,
  `store_card_used` 
DOUBLE,
  
`store_card_basic_used` DOUBLE,
  
`store_card_expand_use

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
rite.format.default'='orc','write.orc.bloom.filter.columns'='order_id','write.orc.compression-codec'='zstd','write.metadata.previous-versions-max'='3','write.metadata.delete-after-commit.enabled'='true')
STORED AS iceberg;


STEP2:
HIVE CREATE EXTERNAL TABLE(location_based_table):
CREATE EXTERNAL TABLE hyt.test_trade
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' 
LOCATION 'hdfs://xxx'
TBLPROPERTIES 
('iceberg.catalog'='location_based_table','engine.hive.enabled'='true');



STEP3:
select count(*) => scan all table







在 2023-11-09 15:36:50,"Butao Zhang"  写道:

Could you please provide detailed steps to reproduce this issue?  e.g. how do 
you create the table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 14:25 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in th

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Could you please provide detailed steps to reproduce this issue?  e.g. how do 
you create the table?



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 14:25 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
Incidentally, I'm using a COW table, so there is no DELETE_FILE.











在 2023-11-09 10:57:35,"Butao Zhang"  写道:

Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread Butao Zhang
Hi lisoda. You can check this ticket 
https://issues.apache.org/jira/browse/HIVE-27347 which can use iceberg basic 
stats to optimize count(*) query. Note: it didn't take effect if having delete 
files.



Thanks,

Butao Zhang

 Replied Message 
| From | lisoda |
| Date | 11/9/2023 10:43 |
| To |  |
| Subject | Re:Re: Re: Hive's performance for querying the Iceberg table is 
very poor. |
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re:Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-11-08 Thread lisoda
HI.
I am testing with HIVE-4.0.0-BETA-1 version and I am using location_based_table.
So far I found that HIVE still can't push some queries down to METADATA, e.g. 
COUNT(*).
Is HIVE 4.0.0-BETA-1 still not able to support query push down?











在 2023-10-24 17:41:20,"Ayush Saxena"  写道:

HIVE-27734 is in progress, as I see we have a POC attached to the ticket, we 
should have it in 2-3 week I believe. 



> Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
> ICEBERG except for normal HIVE tables?


Yep, I believe most of the TPCDS queries would be supported even today on Hive 
master, but 4.0.0 would have them running for sure.


-Ayush


On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-24 Thread Ayush Saxena
HIVE-27734 is in progress, as I see we have a POC attached to the ticket,
we should have it in 2-3 week I believe.

> Also, after the release of 4.0.0, will we be able to do all TPCDS queries
on ICEBERG except for normal HIVE tables?

Yep, I believe most of the TPCDS queries would be supported even today on
Hive master, but 4.0.0 would have them running for sure.

-Ayush

On Tue, 24 Oct 2023 at 14:51, lisoda  wrote:

> Thanks.
> I would like to know if hive currently supports push to ICEBERG table
> partition under JOIN condition.
> Because I see HIVE-27734 is not yet complete, what is its progress so
> far?
> Also, after the release of 4.0.0, will we be able to do all TPCDS queries
> on ICEBERG except for normal HIVE tables?
>
>
>
>
>
> 在 2023-10-24 11:03:07,"Ayush Saxena"  写道:
>
> Hi Lisoda,
>
> The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a
> bunch of improvements on the 4.x line for Hive-Iceberg. You can give
> iceberg a try on the 4.0.0-beta-1 release mentioned here [1], we have a
> bunch of improvements like vecotrization and stuff like that. If you wanna
> give it a quick try on docker, we have docker image published for that here
> [2] & Iceberg works out of the box there.
>
> Rest feel free to create tickets, if you find some specific queries or
> scenarios which are problematic, we will be happy to chase them & get them
> sorted.
>
> PS. Not sure about StarRocks, FWIW. That is something we don't develop as
> part of Apache Hive nor as part of Apache Software Foundation to best of my
> knowledge, so would refrain from or commenting about that on "Apache Hive"
> ML
>
> -Ayush
>
>
> [1] https://hive.apache.org/general/downloads/
> [2] https://hub.docker.com/r/apache/hive/tags
>
> On Tue, 24 Oct 2023 at 05:28, Albert Wong 
> wrote:
>
>> Too bad.   Tencent Games used StarRocks with Apache Iceberg to power
>> their analytics.
>> https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
>>
>>
>> On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:
>>
>>> We are not going to use starrocks.
>>> mpp architecture databases have natural limitations, and starrocks does
>>> not necessarily perform better than hive llap.
>>>
>>>
>>>  Replied Message 
>>> From Albert Wong 
>>> Date 10/24/2023 01:39
>>> To user@hive.apache.org
>>> Cc
>>> Subject Re: Hive's performance for querying the Iceberg table is very
>>> poor.
>>> I would try http://starrocks.io.   StarRocks is an MPP OLAP database
>>> that can query Apache Iceberg and we can cache the data for faster
>>> performance.  We also have additional features like building materialized
>>> views that span across Apache Iceberg, Apache Hudi and Apache Hive.   Here
>>> is a video of connecting the 2 products through a webinar StarRocks did
>>> with Tabular (authors of Apache Iceberg).
>>> https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s
>>>
>>> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>>>
>>>> Hi Team.
>>>>   I recently was testing Hive query Iceberg table , I found that
>>>> Hive query Iceberg table performance is very very poor . Almost impossible
>>>> to use in the production environment . And Join conditions can not be
>>>> pushed down to the Iceberg partition.
>>>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>>>   Now I'm very frustrated because the performance is so bad that I
>>>> can't deliver to my customers. How can I solve this problem?
>>>>  Details:
>>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>>>> I would be grateful if someone could guide me.
>>>>
>>>


Re:Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-24 Thread lisoda
Thanks.
I would like to know if hive currently supports push to ICEBERG table partition 
under JOIN condition.
Because I see HIVE-27734 is not yet complete, what is its progress so far?
Also, after the release of 4.0.0, will we be able to do all TPCDS queries on 
ICEBERG except for normal HIVE tables?











在 2023-10-24 11:03:07,"Ayush Saxena"  写道:

Hi Lisoda,


The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a bunch of 
improvements on the 4.x line for Hive-Iceberg. You can give iceberg a try on 
the 4.0.0-beta-1 release mentioned here [1], we have a bunch of improvements 
like vecotrization and stuff like that. If you wanna give it a quick try on 
docker, we have docker image published for that here [2] & Iceberg works out of 
the box there.


Rest feel free to create tickets, if you find some specific queries or 
scenarios which are problematic, we will be happy to chase them & get them 
sorted.


PS. Not sure about StarRocks, FWIW. That is something we don't develop as part 
of Apache Hive nor as part of Apache Software Foundation to best of my 
knowledge, so would refrain from or commenting about that on "Apache Hive" ML


-Ayush




[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags


On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their 
analytics.   
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
   


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Ayush Saxena
Hi Lisoda,

The iceberg jar for hive 3.1.3 doesn't have a lot of changes, We did a
bunch of improvements on the 4.x line for Hive-Iceberg. You can give
iceberg a try on the 4.0.0-beta-1 release mentioned here [1], we have a
bunch of improvements like vecotrization and stuff like that. If you wanna
give it a quick try on docker, we have docker image published for that here
[2] & Iceberg works out of the box there.

Rest feel free to create tickets, if you find some specific queries or
scenarios which are problematic, we will be happy to chase them & get them
sorted.

PS. Not sure about StarRocks, FWIW. That is something we don't develop as
part of Apache Hive nor as part of Apache Software Foundation to best of my
knowledge, so would refrain from or commenting about that on "Apache Hive"
ML

-Ayush


[1] https://hive.apache.org/general/downloads/
[2] https://hub.docker.com/r/apache/hive/tags

On Tue, 24 Oct 2023 at 05:28, Albert Wong  wrote:

> Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their
> analytics.
> https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.
>
>
> On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:
>
>> We are not going to use starrocks.
>> mpp architecture databases have natural limitations, and starrocks does
>> not necessarily perform better than hive llap.
>>
>>
>>  Replied Message 
>> From Albert Wong 
>> Date 10/24/2023 01:39
>> To user@hive.apache.org
>> Cc
>> Subject Re: Hive's performance for querying the Iceberg table is very
>> poor.
>> I would try http://starrocks.io.   StarRocks is an MPP OLAP database
>> that can query Apache Iceberg and we can cache the data for faster
>> performance.  We also have additional features like building materialized
>> views that span across Apache Iceberg, Apache Hudi and Apache Hive.   Here
>> is a video of connecting the 2 products through a webinar StarRocks did
>> with Tabular (authors of Apache Iceberg).
>> https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s
>>
>> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>>
>>> Hi Team.
>>>   I recently was testing Hive query Iceberg table , I found that
>>> Hive query Iceberg table performance is very very poor . Almost impossible
>>> to use in the production environment . And Join conditions can not be
>>> pushed down to the Iceberg partition.
>>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>>   Now I'm very frustrated because the performance is so bad that I
>>> can't deliver to my customers. How can I solve this problem?
>>>  Details:
>>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>>> I would be grateful if someone could guide me.
>>>
>>


Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
Too bad.   Tencent Games used StarRocks with Apache Iceberg to power their
analytics.
https://medium.com/starrocks-engineering/tencent-games-inside-scoop-the-road-to-cloud-native-with-starrocks-d7dcb2438e25.


On Mon, Oct 23, 2023 at 10:55 AM lisoda  wrote:

> We are not going to use starrocks.
> mpp architecture databases have natural limitations, and starrocks does
> not necessarily perform better than hive llap.
>
>
>  Replied Message 
> From Albert Wong 
> Date 10/24/2023 01:39
> To user@hive.apache.org
> Cc
> Subject Re: Hive's performance for querying the Iceberg table is very
> poor.
> I would try http://starrocks.io.   StarRocks is an MPP OLAP database that
> can query Apache Iceberg and we can cache the data for faster performance.
> We also have additional features like building materialized views that span
> across Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of
> connecting the 2 products through a webinar StarRocks did with Tabular
> (authors of Apache Iceberg).
> https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s
>
> On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:
>
>> Hi Team.
>>   I recently was testing Hive query Iceberg table , I found that Hive
>> query Iceberg table performance is very very poor . Almost impossible to
>> use in the production environment . And Join conditions can not be pushed
>> down to the Iceberg partition.
>>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>>   Now I'm very frustrated because the performance is so bad that I
>> can't deliver to my customers. How can I solve this problem?
>>  Details:
>> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
>> I would be grateful if someone could guide me.
>>
>


Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread lisoda
We are not going to use starrocks.
mpp architecture databases have natural limitations, and starrocks does not 
necessarily perform better than hive llap.



 Replied Message 
| From | Albert Wong |
| Date | 10/24/2023 01:39 |
| To | user@hive.apache.org |
| Cc | |
| Subject | Re: Hive's performance for querying the Iceberg table is very poor. 
|
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that can 
query Apache Iceberg and we can cache the data for faster performance.  We also 
have additional features like building materialized views that span across 
Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of connecting 
the 2 products through a webinar StarRocks did with Tabular (authors of Apache 
Iceberg).  https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s


On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.

Re: Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread Albert Wong
I would try http://starrocks.io.   StarRocks is an MPP OLAP database that
can query Apache Iceberg and we can cache the data for faster performance.
We also have additional features like building materialized views that span
across Apache Iceberg, Apache Hudi and Apache Hive.   Here is a video of
connecting the 2 products through a webinar StarRocks did with Tabular
(authors of Apache Iceberg).
https://www.youtube.com/watch?v=bAmcTrX7hCI&t=10s

On Mon, Oct 23, 2023 at 7:18 AM lisoda  wrote:

> Hi Team.
>   I recently was testing Hive query Iceberg table , I found that Hive
> query Iceberg table performance is very very poor . Almost impossible to
> use in the production environment . And Join conditions can not be pushed
> down to the Iceberg partition.
>   I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
>   Currently I'm using Hive 3.1.3, Iceberg 1.3.1.
>   Now I'm very frustrated because the performance is so bad that I
> can't deliver to my customers. How can I solve this problem?
>  Details:
> https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
> I would be grateful if someone could guide me.
>


Hive's performance for querying the Iceberg table is very poor.

2023-10-23 Thread lisoda
Hi Team.
  I recently was testing Hive query Iceberg table , I found that Hive query 
Iceberg table performance is very very poor . Almost impossible to use in the 
production environment . And Join conditions can not be pushed down to the 
Iceberg partition.
  I'm using the 1.3.1 Hive Runtime Jar from the Iceberg community.
  Currently I'm using Hive 3.1.3, Iceberg 1.3.1. 
  Now I'm very frustrated because the performance is so bad that I can't 
deliver to my customers. How can I solve this problem?
 Details:  
https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1695050248606629
I would be grateful if someone could guide me.