chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493775969
Finally, it's the problem that the record key string can not have character
":" int it. I think there should have some reminders. It takes time to figure
it out.
--
This is an
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493753828
Is there some key len check? It's hard to believe there is too many
conflicts
--
This is an automated message from the Apache Git Service.
To respond to the message, please
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493728296
> Yes, seems some hash conflicts maybe.
But we have about 1 million uid like this, all them are to hash to 1? Can't
believe it.
--
This is an automated message from t
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493725310
It's weird. My uid keys is like below. It seems that bucket hash function
think they are 1.
```
|tiq_fb3c7524-206c-4cef-a87f-4e6379190f38:htmtalent
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493692921
> > I see this pr #8308. It seems that this feature hasn't been merged? So
0.13.0 doesn't support this feature?
>
> Spark support for bulk_insert with bucket index is on-goi
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493552591
I see this pr https://github.com/apache/hudi/pull/8308. It seems that this
feature hasn't been merged? So 0.13.0 doesn't support this feature?
--
This is an automated message fr
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493546337
I test upsert with bucket index in spark. It's ok. But bulk insert with
bucket index seems not as expected as I think.
--
This is an automated message from the Apache Git Servic
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493539406
> I'm sure that they are unique. I test upsert in spark. It's done as
expect. But it's very slow.
> How many distinct uid do you have in your dataset?
I'm sur
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1493538014
I'm sure that they are unique. I test upsert in spark. It's done as expect.
But it's very slow.
--
This is an automated message from the Apache Git Service.
To respond to the me
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492970849
I use below conf to test bulk insert. There is only one parquet. Did I miss
something? I expect 5 parquet. My dataset is about 120GB.
```
CREATE TABLE hbase2hudi_
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492884944
> You do not declare the index type as bucket while doing the bulk_insert.
So do you mean I should change my bulk insert conf like below
```
CREATE TABLE 2hudi_
chenbodeng719 commented on issue #8279:
URL: https://github.com/apache/hudi/issues/8279#issuecomment-1492207358
- bulk insert conf
```
CREATE TABLE hbase2hudi_sink(
uid STRING PRIMARY KEY NOT ENFORCED,
oridata STRING,
update_time TIMESTAMP_LT
12 matches
Mail list logo