Re: storage plugin for simple web server

2016-10-10 Thread Andries Engelbrecht
Can you do a NFS connection to the webserver?

Then maybe just use a local fs storage plugin with the NFS mount as the 
workspace.

I have not tried it myself, but it may be an option to test in your case.

--Andries


> On Oct 7, 2016, at 11:39 AM, Di Pe  wrote:
> 
> Hi,
> 
> I have a couple of 100 csv files on a web server that I can just pull down
> via https without any credentials, I wonder how I can write a storage
> plugin for drill that pull these files directly from the web web server
> without having to download them to the local file system.
> 
> I have a couple of options:
> 
> 1) the plugin could just do to a simple http directory listing to get these
> files
> 2) I could provide a text file with the urls of the files, simply like
>https://mywebserver.com/myfolder/myfile1.csv
>https://mywebserver.com/myfolder/myfile2.csv
> 3) the web server supports json file listing like this
>curl -s https://mywebserver.com/myfolder?format=json | python -m
> json.tool
> [
>{
>"hash": "e5f62378c79ec9c491aa130374dba93b",
>"last_modified": "2016-09-30T19:15:45.730950",
>"bytes": 211169,
>"name": "myfile1.csv",
>"content_type": "text/csv"
>},
>{
> 
> Option 3 would be the most elegant to me
> 
> 
> does something like this already exist or would I duplicate the s3 plugin
> and modify it?
> 
> like this ?
> 
> Thanks for your help!
> dipe
> 
> 
> {
>  "type": "file",
>  "enabled": true,
>  "connection": "https://mywebserver.com/myfolder?format=json";,
>  "config": null,
>  "workspaces": {
>"root": {
>  "location": "/",
>  "writable": false,
>  "defaultInputFormat": null
>},
>"tmp": {
>  "location": "/tmp",
>  "writable": true,
>  "defaultInputFormat": null
>}
>  },
>  "formats": {
>"psv": {
>  "type": "text",
>  "extensions": [
>"tbl"
>  ],
>  "delimiter": "|"
>},
>"csv": {
>  "type": "text",
>  "extensions": [
>"csv"
>  ],
>  "delimiter": ","
>},
>"tsv": {
>  "type": "text",
>  "extensions": [
>"tsv"
>  ],
>  "delimiter": "\t"
>},
>"parquet": {
>  "type": "parquet"
>},
>"json": {
>  "type": "json",
>  "extensions": [
>"json"
>  ]
>},
>"avro": {
>  "type": "avro"
>},
>"sequencefile": {
>  "type": "sequencefile",
>  "extensions": [
>"seq"
>  ]
>},
>"csvh": {
>  "type": "text",
>  "extensions": [
>"csvh"
>  ],
>  "extractHeader": true,
>  "delimiter": ","
>}
>  }
> }



Re: IN operator can take how many inputs ?

2016-10-10 Thread Andries Engelbrecht
Here is the link to upgrading Drill on MapR 5.1
http://maprdocs.mapr.com/51/Drill/upgrading_drill_5.1.html 


Drill 1.8 is supported on MapR 5.1
http://maprdocs.mapr.com/home/InteropMatrix/r_eco_matrix.html 


--Andries


> On Oct 8, 2016, at 1:35 AM, Nicolas Paris  wrote:
> 
> Le sam. 8 oct. 2016 à 05:55, Jinfeng Ni  > a écrit :
> 
>> With larger value for such option, a big IN-list may not be converted
>> into a subquery. As such, the query performance might be suboptimal.
>> 
> 
> My use case is a predicate push-down on an external jdbc database.
> 
> 
>> On Fri, Oct 7, 2016 at 6:54 PM, Gautam Parai > > wrote:
>>> Yes, this was fixed in Drill 1.8 - please let us know if this does not
>> work
>>> in 1.8. As Jinfeng mentioned earlier, the option can be set to a very
>> large
>>> value (LONG type)
>> 
> 
> I am not sure I am able to upgrade drill version on mapr 5.1 distribution.
> I didn't find any documentation. Have you got such ?
> 
> 
>>> Gautam
>>> 
>>> On Fri, Oct 7, 2016 at 2:29 AM, Tushar Pathare 
>> wrote:
>>> 
 Hello,
 
 I think it is fixed in 1.8 .try it on
 
 Get Outlook for iOS
 
 
 
 
 On Fri, Oct 7, 2016 at 12:18 PM +0300, "Nicolas Paris" <
 nipari...@gmail.com> wrote:
 
 Hi,
 
 I am in 1.6 Drill version with mapr distribution 5.1.
 I get this error :
 Error: VALIDATION ERROR: The option 'planner.in_subquery_threshold' does
 not exist
 
 
 Le lun. 3 oct. 2016 à 17:18, Jinfeng Ni  a écrit :
 
> You can modify option `planner.in_subquery_threshold`. By default,
> it's set to be 20. That's the threshold when planner decides to
> convert IN-list to a subquery.
> 
> select * from sys.options where name like '%in_subquery%';
> 
> ++---+-+
 --+--+-+---++
> |  name  | kind  |  type   |  status  |
> num_val  | string_val  | bool_val  | float_val  |
> 
> ++---+-+
 --+--+-+---++
> | planner.in_subquery_threshold  | LONG  | SYSTEM  | DEFAULT  | 20
>  | null| null  | null   |
> 
> ++---+-+
 --+--+-+---++
> 
> On Sun, Oct 2, 2016 at 12:56 AM, Tushar Pathare 
> wrote:
>> Hello Team,
>> 
>> A select clause with IN operator creates a issue if the count of
>> goes
> beyond 19 for params.Is this a tunable or this is a drawback.
>> If the number of params is made less that 19 the same select
>> statement
> works
>> 
>> Select something……
>> 
>> IN ( '94479 ', '296979 ', '219579 ', '109179 ', '97179 ', '223179 ',
> '96279 ', '224979 ', '282879 ', '33279 ', '277179 ', '177879 ',
>> '272049
 ',
> '49179 ', '104049 ','177879 ', '272049 ', '49179 ', '104049 ',
>> '104049 ')
>> 
>> Error thrown
>> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ
 ERROR:
> The JDBC storage plugin failed while trying setup the SQL query
>> 
>> 
>> 
>> 
>> Tushar B Pathare
>> High Performance Computing (HPC) Administrator
>> General Parallel File System
>> Scientific Computing
>> Bioinformatics Division
>> Research
>> 
>> Sidra Medical and Research Centre
>> Sidra OPC Building
>> PO Box 26999  |  Doha, Qatar
>> Near QNCC,5th Floor
>> Office 4003  ext 37443 | M +974 74793547 <+974%207479%203547>
>> <+974%207479%203547>
>> tpath...@sidra.org | www.sidra.org <
 http://www.sidra.org/>
>> 
>> 
>> 
>> 
>> 
>> 
>> Disclaimer: This email and its attachments may be confidential and
>> are
> intended solely for the use of the individual to whom it is
>> addressed. If
> you are not the intended recipient, any reading, printing, storage,
> disclosure, copying or any other action taken in respect of this
>> e-mail
 is
> prohibited and may be unlawful. If you are not the intended recipient,
> please notify the sender immediately by using the reply function and
>> then
> permanently delete what you have received. Any views or opinions
 expressed
> are solely those of the author and do not necessarily represent those
>> of
> Sidra Medical and Research Center.
> 
 Disclaimer: This email and its attachments may be confidential and are
 intended solely for the use of the individual to whom it is addressed.
>> If
 you are not the intended recipient, any reading, printing, storage,
 dis