Hi,
I already have a partitioned JSON dataset in s3 like the below:
edl_timestamp=2022090800
Now, the problem is, in the earlier 10 days of data collection there was a
duplicate columns issue due to which we couldn't read the data.
Now the latest 10 days of data are proper. So, I am trying
Hi Team,
Can somebody help?
Thanks,
Sid
On Sun, Jun 19, 2022 at 3:51 PM Sid wrote:
> Hi,
>
> I already have a partitioned JSON dataset in s3 like the below:
>
> edl_timestamp=2022090800
>
> Now, the problem is, in the earlier 10 days of data collection there was a
> duplicate columns issue
Hi Team,
I still need help in understanding how reading works exactly?
Thanks,
Sid
On Mon, Jun 20, 2022 at 2:23 PM Sid wrote:
> Hi Team,
>
> Can somebody help?
>
> Thanks,
> Sid
>
> On Sun, Jun 19, 2022 at 3:51 PM Sid wrote:
>
>> Hi,
>>
>> I alrea
<https://medium.com/@songkunjump/spark-understand-the-basic-of-pushed-filter-and-partition-filter-using-parquet-file-3e5789e260bd>
tir. 5. jul. 2022 kl. 21:21 skrev Sid :
> Hi Team,
>
> I still need help in understanding how reading works exactly?
>
> Thanks,
> Si
ic-of-pushed-filter-and-partition-filter-using-parquet-file-3e5789e260bd>
>
>
>
>
>
> tir. 5. jul. 2022 kl. 21:21 skrev Sid :
>
>> Hi Team,
>>
>> I still need help in understanding how reading works exactly?
>>
>> Thanks,
>> Sid
>>
&
ading files.
>> The one that I know about is Parquet. Like this link explains Spark:
>> Understand the Basic of Pushed Filter and Partition Filter Using Parquet
>> File
>> <https://medium.com/@songkunjump/spark-understand-the-basic-of-pushed-filter-and-partition-filter-using