Re: [Python] - Dataset API - What's happening under the hood?

2022-09-15 Thread Aldrin
(oh, sorry I misread `pa.scalar` as `pc.scalar`, so please try `pyarrow.scalar` per the documentation) Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, Sep 15, 2022 at 5:26 PM Aldrin wrote: > For Question 2: > At a glance, I don't see anything in adlfs or azure that is able

Re: [Python] - Dataset API - What's happening under the hood?

2022-09-15 Thread Aldrin
For Question 2: At a glance, I don't see anything in adlfs or azure that is able to do partial reads of a blob. If you're using block blobs, then likely you would want to store blocks of your file as separate blocks of a blob, and then you can do partial data transfers that way. I could be

?????? [c++][compute]Is there any other way to use Join besides Acero??

2022-09-15 Thread 1057445597
this jira https://issues.apache.org/jira/browse/ARROW-17740 1057445597 1057445...@qq.com ---- ??: "user"

Re: [c++][compute]Is there any other way to use Join besides Acero?

2022-09-15 Thread Niranda Perera
Hi, You can give pycylon a try [1]. It has a similar API endpoint in pycylon.dataframe interface [2]. Best [1] https://github.com/cylondata/cylon [2] https://github.com/cylondata/cylon/blob/main/python/pycylon/examples/dataframe/join.py On Thu, Sep 15, 2022 at 10:04 AM 1057445597

Re: [c++][compute]Is there any other way to use Join besides Acero?

2022-09-15 Thread Jacek Pliszka
Hi! Why don't you use arrow Table join directly ? https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.join Though you need to be careful with join order as speed may be differ depending on order of the joined tables. BR, Jacek czw., 15 wrz 2022 o 06:15 Weston