Re: Last Column showing blank in csv file

2016-11-30 Thread Leon Clayton
Are we sure there is no hidden characters on the end of the one or more lines. 
thinking  ?

try a dos2unix on the file to check this theory out. 


> On 30 Nov 2016, at 10:45, Sanjiv Kumar  wrote:
> 
> Hello
>  Yes you are right.  select * from `tmp.csv`
> Is Working fine, but if select column then the last column data showing
> blank.
> Run this query:-
> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`, A.class
> from dfs.tmp.`copydata.csv` as A;
> 
> If you run this query you will get last column data as blank.
> 
> On Wed, Nov 30, 2016 at 11:15 AM, Sanjiv Kumar  wrote:
> 
>> I am using latest version 1.8 and in window 10 operating system.
>> 
>> On Tue, Nov 29, 2016 at 11:40 AM, Sanjiv Kumar 
>> wrote:
>> 
>>> I already pasted the csv file. Just copy and save it as csv. I am also
>>> attaching csv file. And the query is same as i mention above.
>>> 
>>> select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`, A.class
>>> from dfs.tmp.`copydata.csv` as A;
>>> 
>>> On Mon, Nov 28, 2016 at 6:30 PM, Sanjiv Kumar 
>>> wrote:
>>> 
 Ya Its working but what if i am using :-
  select A.`sepalen`, A.`sepalwidth`, A.`patelen`, A.`patelwidth`,
 A.class from dfs.tmp.`copydata.csv` as A;
 
 why my last column data showing blank?
 And One more thing See my csv file:-
 
 sepalen,sepalwidth,patelen,patelwidth,class
 5.1,3.5,1.4,Iris-setosa,0.2
 4.9,3,1.4,Iris-setosa,0.2
 4.7,3.2,1.3,Iris-setosa,0.2
 4.6,3.1,1.5,Iris-setosa,0.2
 5,3.6,1.4,Iris-setosa,0.2
 5.4,3.9,1.7,Iris-setosa,0.4
 4.6,3.4,1.4,Iris-setosa,0.3
 5,3.4,1.5,Iris-setosa,0.2
 4.4,2.9,1.4,Iris-setosa,0.2
 4.9,3.1,1.5,Iris-setosa,0.1
 5.4,3.7,1.5,Iris-setosa,0.2
 4.8,3.4,1.6,Iris-setosa,0.2
 
 This is my previous file.
 Now if i add comma after class Check this new file.
 
 sepalen,sepalwidth,patelen,patelwidth,class,
 5.1,3.5,1.4,Iris-setosa,0.2
 4.9,3,1.4,Iris-setosa,0.2
 4.7,3.2,1.3,Iris-setosa,0.2
 4.6,3.1,1.5,Iris-setosa,0.2
 5,3.6,1.4,Iris-setosa,0.2
 5.4,3.9,1.7,Iris-setosa,0.4
 4.6,3.4,1.4,Iris-setosa,0.3
 5,3.4,1.5,Iris-setosa,0.2
 4.4,2.9,1.4,Iris-setosa,0.2
 4.9,3.1,1.5,Iris-setosa,0.1
 5.4,3.7,1.5,Iris-setosa,0.2
 4.8,3.4,1.6,Iris-setosa,0.2
 
 
 And Fire this query:- select A.`sepalen`, A.`sepalwidth`, A.`patelen`,
 A.`patelwidth`, A.class from dfs.tmp.`copydata.csv` as A;
 
 then the output is show fine. but if the comma is not there in then the
 last column data  showing blank.
 
 Is this a bug ??
 ..
  Thanks & Regards
  *Sanjiv Kumar*
 
>>> 
>>> 
>>> 
>>> --
>>> ..
>>>  Thanks & Regards
>>>  *Sanjiv Kumar*
>>> 
>> 
>> 
>> 
>> --
>> ..
>>  Thanks & Regards
>>  *Sanjiv Kumar*
>> 
> 
> 
> 
> -- 
> ..
>  Thanks & Regards
>  *Sanjiv Kumar*



Re: tmp noexec

2016-07-25 Thread Leon Clayton

I move the /tmp off local disk into the distributed FS on a node local volume 
on MapR. Other file systems can be inserted. 

Open up drill-override.conf on all of the nodes, and insert this :

sort: {
purge.threshold : 100,
external: {
  batch.size : 4000,
  spill: {
batch.size : 4000,
group.size : 100,
threshold : 200,
directories : [ "/var/mapr/local/Hostname/drillspill" ],
fs : "maprfs:///"
  }
}
  }

> On 25 Jul 2016, at 16:44, scott  wrote:
> 
> Hello,
> I've run into an issue where Drill will not start if mount permissions are
> set on /tmp to noexec. The permissions were set to noexec due to security
> concerns. I'm using Drill version 1.7. The error I get when starting Drill
> is:
> 
> Exception in thread "main" java.lang.UnsatisfiedLinkError:
> /tmp/libnetty-transport-native-epoll5743269078378802025.so:
> /tmp/libnetty-transport-native-epoll5743269078378802025.so: failed to map
> segment from shared object: Operation not permitted
> 
> Does anyone know of a way to configure Drill to use a different tmp
> location?
> 
> Thanks,
> Scott



Re: join fail

2016-05-09 Thread Leon Clayton
did you increase the memory setting for Drill from the default?

https://drill.apache.org/docs/configuring-drill-memory/ 



> On 10 May 2016, at 02:25, lizhenm...@163.com wrote:
> 
> 
> hi:
> i run join operation in the drill, i use broadcast and put the small table in 
> the right. The small table has 3200 rows. I have set the 
> planner.broadcast_threshold to 1. The cluster has three nodes and 
> every node has 64G memory. when join is running, the memory is increasing 
> untill the driilbit process exit. But the same query is run successful in the 
> impala and they are in the same cluster.
> here is the query plan.
> 
> 00-00Screen : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE 
> totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative cost = 
> {5.29400561759E8 rows, 6.356723058846001E10 cpu, 0.0 io, 
> 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5015
> 00-01  Project(sourceIP=[$0], totalRevenue=[$1], avgPageRank=[$2]) : 
> rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY 
> avgPageRank): rowcount = 1.0, cumulative cost = {5.294005616585E8 rows, 
> 6.356723058836001E10 cpu, 0.0 io, 1.4803953770495996E11 network, 
> 9.1066982688E8 memory}, id = 5014
> 00-02SelectionVectorRemover : rowType = RecordType(VARCHAR(65535) 
> sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative 
> cost = {5.294005616585E8 rows, 6.356723058836001E10 cpu, 0.0 io, 
> 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5013
> 00-03  Limit(fetch=[1]) : rowType = RecordType(VARCHAR(65535) 
> sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 1.0, cumulative 
> cost = {5.294005606585E8 rows, 6.356723058736001E10 cpu, 0.0 io, 
> 1.4803953770495996E11 network, 9.1066982688E8 memory}, id = 5012
> 00-04SingleMergeExchange(sort0=[1 DESC]) : rowType = 
> RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank): 
> rowcount = 457983.777, cumulative cost = {5.294005596585E8 rows, 
> 6.356723058336001E10 cpu, 0.0 io, 1.4803953770495996E11 network, 
> 9.1066982688E8 memory}, id = 5011
> 01-01  SelectionVectorRemover : rowType = 
> RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank): 
> rowcount = 457983.777, cumulative cost = {5.28942575879E8 rows, 
> 6.35617347781E10 cpu, 0.0 io, 1.4241183301631998E11 network, 
> 9.1066982688E8 memory}, id = 5010
> 01-02TopN(limit=[1]) : rowType = RecordType(VARCHAR(65535) 
> sourceIP, DOUBLE totalRevenue, ANY avgPageRank): rowcount = 
> 457983.777, cumulative cost = {5.28484592099E8 rows, 
> 6.356127679422001E10 cpu, 0.0 io, 1.4241183301631998E11 network, 
> 9.1066982688E8 memory}, id = 5009
> 01-03  Project(sourceIP=[$0], totalRevenue=[$1], 
> avgPageRank=[$2]) : rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE 
> totalRevenue, ANY avgPageRank): rowcount = 457983.777, cumulative 
> cost = {5.280266083193E8 rows, 6.356127679422001E10 cpu, 0.0 io, 
> 1.4241183301631998E11 network, 9.1066982688E8 memory}, id = 5008
> 01-04HashToRandomExchange(dist0=[[$1]]) : rowType = 
> RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.777, cumulative cost = 
> {5.280266083193E8 rows, 6.356127679422001E10 cpu, 0.0 io, 
> 1.4241183301631998E11 network, 9.1066982688E8 memory}, id = 5007
> 02-01  UnorderedMuxExchange : rowType = 
> RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.777, cumulative cost = 
> {5.275686245396E8 rows, 6.3553949053740005E10 cpu, 0.0 io, 
> 1.349082267647E11 network, 9.1066982688E8 memory}, id = 5006
> 03-01Project(sourceIP=[$0], totalRevenue=[$1], 
> avgPageRank=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)]) : rowType = 
> RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY avgPageRank, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 457983.777, cumulative cost = 
> {5.2711064076E8 rows, 6.355349106996001E10 cpu, 0.0 io, 1.349082267647E11 
> network, 9.1066982688E8 memory}, id = 5005
> 03-02  Project(sourceIP=[$0], 
> totalRevenue=[CASE(=($4, 0), null, $3)], 
> avgPageRank=[CAST(/(CastHigh(CASE(=($2, 0), null, $1)), $2)):ANY NOT NULL]) : 
> rowType = RecordType(VARCHAR(65535) sourceIP, DOUBLE totalRevenue, ANY 
> avgPageRank): rowcount = 457983.777, cumulative cost = 
> {5.2665265698E8 rows, 6.3551659134840004E10 cpu, 0.0 io, 
> 1.349082267647E11 network, 9.1066982688E8 memory}, id = 5004
> 03-03HashAgg(group=[{0}], agg#0=[$SUM0($1)], 
> agg#1=[$SUM0($2)], agg#2=[$SUM0($3)], agg#3=[$SUM0($4)]) : rowType = 
> 

Re: drill exception

2016-05-04 Thread Leon Clayton
This mailing list does not allow attachments. Please host the image elsewhere 
or share the log files. 


> On 4 May 2016, at 07:31, rin tohsaka  wrote:
> 
> Hello:
>   When querying a hive partition table in sqlline, I get a problem.I 
> don't know how to resolve it. I have screenshot the exception and attach it 
> to this mail. Please help me!



Re: directory create CTAS behaviour

2016-01-14 Thread Leon Clayton
I don’t require the output to go to a single file i just want to remove the sub 
folder creation on the output location.. 

Regards
 
Leon Clayton
Solutions Architect  | +44 (0)7530 980566
MapR Technologies UK Ltd.

Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training>
> On 14 Jan 2016, at 15:25, Jason Altekruse <altekruseja...@gmail.com> wrote:
> 
> The reason for creating a directory is that it allows us to write the data
> in parallel, writing different fragments of the query out to their own
> files. It would cost write performance if we enabled this as an option. As
> Drill consumes directories just like individual files we didn't bother
> giving the option/
> 
> Is there some external tool you want to process the data with that
> necessitates you have a single file?
> 
> On Thu, Jan 14, 2016 at 6:25 AM, Leon Clayton <lclay...@maprtech.com> wrote:
> 
>> no folder, just the file in the location specified.
>> 
>> Now Available - Free Hadoop On-Demand Training <
>> http://www.mapr.com/training>
>>> On 14 Jan 2016, at 14:20, Neeraja Rentachintala <
>> nrentachint...@maprtech.com> wrote:
>>> 
>>> What would you like to see instead of directory.
>>> 
>>> On Thursday, January 14, 2016, Leon Clayton <lclay...@maprtech.com>
>> wrote:
>>> 
>>>> Hello All
>>>> 
>>>> Is it possible to change this behaviour. By default, a directory is
>>>> created, using the exact table name specified in the CTAS statement. I
>>>> don’t want a directory.
>>>> 
>>>> http://drill.apache.org/docs/create-table-as-ctas-command/
>>>> 
>>>> Regards
>>>> 
>>>> Leon Clayton
>>>> 
>>>> 
>>>> 
>> 
>> 



INSTR function

2015-09-08 Thread Leon Clayton
Hello All

Anyone come up with a way to do the instr function within Apache Drill. INSTR 
function returns the position of a substring in a string. 

Regards
 
Leon Clayton




Re: Query Failed: An Error Occurred

2015-06-17 Thread Leon Clayton
Just done this on the sandbox and it works fine. 

0: jdbc:drill: select count(1) from hive.orders;
++
|   EXPR$0   |
++
| 122000 |
++

On 17 Jun 2015, at 14:22, Arthur Chan arthur.hk.c...@gmail.com wrote:

 Hi,
 
 I ran a simple one table hive query select count(1) from
 hive.`default`.txn,
 I got
 
 Query Failed: An Error Occurred
 org.apache.drill.common.exceptions.UserRemoteException:
 SYSTEM ERROR: java.lang.RuntimeException: serious problem
 
 Please advise
 Regards



Does Drill support User Defined Functions (UDF)

2014-12-22 Thread Leon Clayton
Hello all

Does drill support user defined functions? If so can you share how to use these 
with an example? Thanks in advance. 

Regards
 
 Leon Clayton
 lclay...@mapr.com