It seems that column `link_crtd_date` is of type `timestamp` and you therefore
partition by date including time, which produces a huge amount of directories.
I assume your intent is to partition by date (partition_date=yyyy-MM-dd or
year=yyyy/month=MM/day=dd) so you need to format/split your timestamp
accordingly, for example:
-- partitioned by 'yyyy-MM-dd'
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2
PARTITION (partition_date)
SELECT
date_format(link_crtd_date, 'yyyy-MM-dd') as partition_date,
*
FROM bsl12.email_edge_lyh_mth1;
-- partitioned by year/month/day
INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2
PARTITION (year, month, day)
SELECT
year(link_crtd_date, 'yyyy-MM-dd') as year,
month(link_crtd_date, 'yyyy-MM-dd') as month,
day(link_crtd_date, 'yyyy-MM-dd') as day,
*
FROM bsl12.email_edge_lyh_mth1;
Best Regards
Roland Johann
Software Developer/Data Engineer
phenetic GmbH
Lütticher Straße 10, 50674 Köln, Germany
Mobil: +49 172 365 26 46
Mail: [email protected]
Web: phenetic.io
Handelsregister: Amtsgericht Köln (HRB 92595)
Geschäftsführer: Roland Johann, Uwe Reimann
> Am 23.08.2019 um 09:43 schrieb zhangliyun <[email protected]>:
>
> Hi all:
> when i use spark dynamic partition feature , i met a problem about hdfs
> quota. I found that it is every easy to meet quota problem (exceed the max
> value of quota of directory)
>
> I have generated a unpartitioned table 'bsl12.email_edge_lyh_mth1' which
> contains 584M records and will insert it to a partitioned table
> "bsl12.email_edge_lyh_partitioned2"
> --select count(*) from bsl12.email_edge_lyh_mth1; --584652128
> --INSERT OVERWRITE TABLE bsl12.email_edge_lyh_partitioned2 PARTITION
> (link_crtd_date) SELECT * FROM bsl12.email_edge_lyh_mth1;
>
>
> when i viewed the temporary directory when sql running, i saw multiple file
> with link_crd_date=2018-01-01***, I guess one record one temporary file. as
> there are 584M data in the unpartitioned table, is there any parameters for
> us to control the temporary file count to avoid the quota problem.
>
> ```
>
> 133
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-0112%3A35%3A29
> 137
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
> 12%3A35%3A47
> 136
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
> 12%3A38%3A23
> 132
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
> 12%3A38%3A54
> 536
> hdfs://horton/apps/risk/ars/datamart/email_edge_lyh_partitioned2/.hive-staging_hive_2019-08-22_19-41-38_747_7237025592628396381-1/-ext-10000/_temporary/0/_temporary/attempt_20190822195048_0000_m_001404_0/link_crtd_date=2018-01-01
> 12%3A40%3A01
>
> ```
>
> Best Regards
>
> Kelly Zhang
>
>
>