Re: Partitioned tables in SQL client configuration.

2020-12-03 Thread Maciek Próchniak

Hi Jark,

thanks for answer. I'm a bit puzzled, because in my yaml I'm using  
"connector: filesystem" (not connector.type). I don't think I end up using


https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/connect.html#file-system-connector 
- this connector as partitioning and orc format are handled correctly.



It's also not clear for me what is "not legacy" connector for reading 
files directly from filesystem (no Hive). I don't see any implementation 
of DynamicTableSourceFactory which would do this.


I assumed that using DDL I wrote below also gives me 
FileSystemTableFactory, am I wrong?



thanks,

maciek



On 03.12.2020 16:26, Jark Wu wrote:
Only legacy connectors (`connector.type=kafka` instead of 
`connector=kafka`) are supported in the YAML at the moment. You can 
use regular DDL instead. There is a similar discussion in 
https://issues.apache.org/jira/browse/FLINK-20260 
 these days.


Best,
Jark

On Thu, 3 Dec 2020 at 00:52, Till Rohrmann > wrote:


Hi Maciek,

I am pulling in Timo who might help you with this problem.

Cheers,
Till

On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak mailto:m...@touk.pl>> wrote:

Hello,

I try to configure SQL Client to query partitioned ORC data on
local
filesystem. I have directory structure like that:

/tmp/table1/startdate=2020-11-28

/tmp/table1/startdate=2020-11-27

etc.


If I run SQL Client session and create table by hand:

create table tst (column1 string, startdate string)
partitioned by
(startdate) with ('connector'='filesystem', 'format'='orc',
'path'='/tmp/table1');

everything runs fine:

explain select * from tst where startdate='2020-11-27'

shows that only one partition in 'readPartitions'


However, I struggle to configure table in .yaml config.

I tried like this (after some struggle, as "partition.keys"
setting
doesn't seem to be documented...) :

tables:
   - name: tst2
 type: source-table
 connector: filesystem
 path: "/tmp/table1"
 format: orc
 partition.keys:
   - name: startdate
 schema:
   - name: column1
 data-type: string
   - name: startdate
 data-type: string

and it more or less works - queries are executed properly.
However,
partitions are not pruned:

explain select * from tst2 where startdate='2020-11-27'

show all partitions in 'readPartitions'


Any idea what can be wrong? I'm using Flink 1.11.2


thanks,

maciek




Re: Partitioned tables in SQL client configuration.

2020-12-03 Thread Jark Wu
Only legacy connectors (`connector.type=kafka` instead of
`connector=kafka`) are supported in the YAML at the moment. You can use
regular DDL instead. There is a similar discussion in
https://issues.apache.org/jira/browse/FLINK-20260 these days.

Best,
Jark

On Thu, 3 Dec 2020 at 00:52, Till Rohrmann  wrote:

> Hi Maciek,
>
> I am pulling in Timo who might help you with this problem.
>
> Cheers,
> Till
>
> On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak  wrote:
>
>> Hello,
>>
>> I try to configure SQL Client to query partitioned ORC data on local
>> filesystem. I have directory structure like that:
>>
>> /tmp/table1/startdate=2020-11-28
>>
>> /tmp/table1/startdate=2020-11-27
>>
>> etc.
>>
>>
>> If I run SQL Client session and create table by hand:
>>
>> create table tst (column1 string, startdate string) partitioned by
>> (startdate) with ('connector'='filesystem', 'format'='orc',
>> 'path'='/tmp/table1');
>>
>> everything runs fine:
>>
>> explain select * from tst where startdate='2020-11-27'
>>
>> shows that only one partition in 'readPartitions'
>>
>>
>> However, I struggle to configure table in .yaml config.
>>
>> I tried like this (after some struggle, as "partition.keys" setting
>> doesn't seem to be documented...) :
>>
>> tables:
>>- name: tst2
>>  type: source-table
>>  connector: filesystem
>>  path: "/tmp/table1"
>>  format: orc
>>  partition.keys:
>>- name: startdate
>>  schema:
>>- name: column1
>>  data-type: string
>>- name: startdate
>>  data-type: string
>>
>> and it more or less works - queries are executed properly. However,
>> partitions are not pruned:
>>
>> explain select * from tst2 where startdate='2020-11-27'
>>
>> show all partitions in 'readPartitions'
>>
>>
>> Any idea what can be wrong? I'm using Flink 1.11.2
>>
>>
>> thanks,
>>
>> maciek
>>
>>
>>


Re: Partitioned tables in SQL client configuration.

2020-12-02 Thread Till Rohrmann
Hi Maciek,

I am pulling in Timo who might help you with this problem.

Cheers,
Till

On Tue, Dec 1, 2020 at 6:51 PM Maciek Próchniak  wrote:

> Hello,
>
> I try to configure SQL Client to query partitioned ORC data on local
> filesystem. I have directory structure like that:
>
> /tmp/table1/startdate=2020-11-28
>
> /tmp/table1/startdate=2020-11-27
>
> etc.
>
>
> If I run SQL Client session and create table by hand:
>
> create table tst (column1 string, startdate string) partitioned by
> (startdate) with ('connector'='filesystem', 'format'='orc',
> 'path'='/tmp/table1');
>
> everything runs fine:
>
> explain select * from tst where startdate='2020-11-27'
>
> shows that only one partition in 'readPartitions'
>
>
> However, I struggle to configure table in .yaml config.
>
> I tried like this (after some struggle, as "partition.keys" setting
> doesn't seem to be documented...) :
>
> tables:
>- name: tst2
>  type: source-table
>  connector: filesystem
>  path: "/tmp/table1"
>  format: orc
>  partition.keys:
>- name: startdate
>  schema:
>- name: column1
>  data-type: string
>- name: startdate
>  data-type: string
>
> and it more or less works - queries are executed properly. However,
> partitions are not pruned:
>
> explain select * from tst2 where startdate='2020-11-27'
>
> show all partitions in 'readPartitions'
>
>
> Any idea what can be wrong? I'm using Flink 1.11.2
>
>
> thanks,
>
> maciek
>
>
>


Partitioned tables in SQL client configuration.

2020-12-01 Thread Maciek Próchniak

Hello,

I try to configure SQL Client to query partitioned ORC data on local 
filesystem. I have directory structure like that:


/tmp/table1/startdate=2020-11-28

/tmp/table1/startdate=2020-11-27

etc.


If I run SQL Client session and create table by hand:

create table tst (column1 string, startdate string) partitioned by 
(startdate) with ('connector'='filesystem', 'format'='orc', 
'path'='/tmp/table1');


everything runs fine:

explain select * from tst where startdate='2020-11-27'

shows that only one partition in 'readPartitions'


However, I struggle to configure table in .yaml config.

I tried like this (after some struggle, as "partition.keys" setting 
doesn't seem to be documented...) :


tables:
  - name: tst2
    type: source-table
    connector: filesystem
    path: "/tmp/table1"
    format: orc
    partition.keys:
  - name: startdate
    schema:
  - name: column1
    data-type: string
  - name: startdate
    data-type: string

and it more or less works - queries are executed properly. However, 
partitions are not pruned:


explain select * from tst2 where startdate='2020-11-27'

show all partitions in 'readPartitions'


Any idea what can be wrong? I'm using Flink 1.11.2


thanks,

maciek