Akshay, I don't understand why you can't use Drill to create the parquet files. Can you say more?
Is there a language constraint? A process constraint? As I hear it, you are asking "I don't want to use Drill to create parquet, I want to use something else". The problem is that there are tons of other ways. I start with not understanding your needs (coz I think Drill is the easiest way for me to create parquet files) and then have no idea which direction you are headed. Just a little more definition could help me (and others) help you. On Thu, May 27, 2021 at 8:18 AM Akshay Bhasin (BLOOMBERG/ 731 LEX) < abhasi...@bloomberg.net> wrote: > Hi Drill Team, > > I've another ques - is there a python parquet module you provide/support > which I can leverage to create .parquet & .parquet.crc files which drill > creates. > > I currently have a drill cluster & I want to use it for reading the data > but not creating the parquet files. > > I'm aware of other modules, but I want to preserve the speed & > optimization of drill - so particularly looking at the module which drill > uses to convert files to parquet & parquet.crc. > > My end goal here is to have a drill cluster reading data from s3 & a > separate process to convert data to parquet & parquet.crc files & upload it > to s3. > > Best, > Akshay > > From: ted.dunn...@gmail.com At: 04/27/21 17:37:43 UTC-4:00 > To: Akshay Bhasin (BLOOMBERG/ 731 LEX ) <abhasi...@bloomberg.net> > Cc: dev@drill.apache.org > Subject: Re: Feature/Question > > > Akshay, > > That's great news! > > On Tue, Apr 27, 2021 at 1:10 PM Akshay Bhasin (BLOOMBERG/ 731 LEX) < > abhasi...@bloomberg.net> wrote: > >> Hi Ted, >> >> Thanks for reaching out. Yes - the below worked successfully. >> >> I was able to create different objects in s3 like 'XXX/YYY/filename', >> 'XXX/ZZZ/filename' and able to query like >> SELECT * FROM XXX. >> >> Thanks ! >> >> Best, >> Akshay >> >> From: ted.dunn...@gmail.com At: 04/21/21 17:21:42 >> To: Akshay Bhasin (BLOOMBERG/ 731 LEX ) <abhasi...@bloomberg.net>, >> dev@drill.apache.org >> Subject: Re: Feature/Question >> >> >> Akshay, >> >> Yes. It is possible to do what you want from a few different angles. >> >> As you have noted, S3 doesn't have directories. Not really. On the other >> hand, people simulate this using naming schemes and S3 has some support for >> this. >> >> One of the simplest ways to deal with this is to create a view that >> explicitly mentions every S3 object that you have in your table. The >> contents of this view can get a bit cumbersome, but that shouldn't be a >> problem since users never need to know. You will need to set up a scheduled >> action to update this view occasionally, but that is pretty simple. >> >> The other way is to use a naming scheme with a delimiter such as /. This >> is described at >> https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-prefixes.html >> If you do that and have files named (for instance) foo/a.json, >> foo/b.json, foo/c.json and you query >> >> select * from s3.`foo` >> >> you should see the contents of a.json, b.json and c.json. See here for >> commentary >> <https://stackoverflow.com/questions/44785065/apache-drill-how-to-query-all-files-in-an-s3-bucket> >> >> I haven't tried this, however, so I am simply going on the reports of >> others. If this works for you, please report your success back here. >> >> >> >> >> >> On Wed, Apr 21, 2021 at 11:34 AM Akshay Bhasin (BLOOMBERG/ 731 LEX) < >> abhasi...@bloomberg.net> wrote: >> >>> Hi Drill Community, >>> >>> I'm Akshay and I'm using Drill for a project I'm working on. >>> >>> There is this particular use case I want to implement - I want to know >>> if its possible. >>> >>> 1) Currently, we have a partition of file system and we create a view on >>> top of it. For example, we have below directory structure - >>> >>> /home/product/product_name/year/month/day/*parquet >>> /home/product/product_name_2/year/month/day/*parquet >>> /home/product/product_name_3/year/month/day/*parquetdev >>> >>> Now, we create a view over it - >>> Create view temp AS SELECT `dir0` AS prod, `dir1` as year, `dir2` as >>> month, `dir3` as day, * from dfs.`/home/product`; >>> >>> Then, we can query all the data dynamically - >>> SELECT * from temp LIMIT 5; >>> >>> 2) Now I want to replicate this behavior via s3. I want to ask if its >>> possible - I was able to create a logical directory. But s3 inherently does >>> not support directories only objects. >>> >>> Therefore, I was curious to know if it is supported/way to do this. I >>> was unable to find any documentation on your website related to >>> partitioning data on s3. >>> >>> Thanks for your help. >>> Best, >>> Akshay >> >> >> >