Re: StoreInKiteDataset help

2015-10-16 Thread Oleg Zhurakousky
Chris

Could you elaborate on your use case a bit more? Specifically about where is 
the source of data you want to pump into hive (e.g., Streaming, bulk file load 
etc.)

Cheers
Oleg

On Oct 16, 2015, at 8:56 AM, Christopher Wilson 
> wrote:

Joe, it was an HDP issue.  I didn't leap to NiFi if the examples didn't work.  
Thanks again.

Also, if there's a better way to pump data into Hive I'm all ears.

-Chris

On Fri, Oct 16, 2015 at 8:53 AM, Christopher Wilson 
> wrote:
Joe, the first hurdle is to get ojdbc6.jar downloaded and installed in 
/usr/share/java.  There's a link created in /usr/hdp/2.3.0.0-2557/hive/lib/ but 
points to nothing.

Here's the hurdle I can't get past.  If you install and run kite-dataset from 
the web site and run through the example with debug and verbose turned on 
(below) you get the output below.  It thinks mapreduce.tar.gz doesn't exist, 
but it does (way down below).  I've run this as users root and hdfs with no 
joy.  Thanks for looking.

debug=true ./kite-dataset -v csv-import sandwiches.csv sandwiches

WARNING: Use "yarn jar" to launch YARN applications.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
1 job failure(s) occurred:
org.kitesdk.tools.CopyTask: 
Kite(dataset:file:/tmp/0c1454eb-7831-4d6b-85a2-63a6cc8c51... ID=1 (1/1)(1): 
java.io.FileNotFoundException: File 
file:/hdp/apps/2.3.0.0-2557/mapreduce/mapreduce.tar.gz does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
at 
org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:110)
at 
org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467)
at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193)
at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2189)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2189)
at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:601)
at 
org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:457)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at 
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
at 
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
at 
org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
at 
org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
at java.lang.Thread.run(Thread.java:745)

[hdfs@sandbox ~]$ hdfs dfs -ls /hdp/apps/2.3.0.0-2557/mapreduce
Found 2 items
-r--r--r--   1 hdfs hadoop 105893 2015-08-20 08:36 
/hdp/apps/2.3.0.0-2557/mapreduce/hadoop-streaming.jar
-r--r--r--   1 hdfs hadoop  207888607 2015-08-20 08:33 
/hdp/apps/2.3.0.0-2557/mapreduce/mapreduce.tar.gz


On Thu, Oct 15, 2015 at 3:22 PM, Joe Witt 
> wrote:
Chris,

Are you seeing errors in NiFi or in HDP?  If you're seeing errors in
NiFi can you please send us the logs?

Thanks
Joe

On Thu, Oct 15, 2015 at 3:02 PM, Christopher Wilson 
> wrote:
> Has anyone gotten Kite to work on HDP?  I'd wanted to do this very thing but
> am running into all kinds of issues with having .jar files not in the
> distributed cache (basically in /apps/hdp).
>
> Any feedback appreciated.
>
> -Chris
>
> On Sat, Sep 19, 2015 at 11:04 AM, Tyler Hawkes 
> 

Re: StoreInKiteDataset help

2015-10-16 Thread Christopher Wilson
Use the example from the Kite web site.

http://kitesdk.org/docs/1.1.0/Install-Kite.html

http://kitesdk.org/docs/1.1.0/Using-the-Kite-CLI-to-Create-a-Dataset.html

Sorry for not being clear and thanks for the help.

-Chris


On Fri, Oct 16, 2015 at 11:19 AM, Oleg Zhurakousky <
ozhurakou...@hortonworks.com> wrote:

> Chris
>
> Could you elaborate on your use case a bit more? Specifically about where
> is the source of data you want to pump into hive (e.g., Streaming, bulk
> file load etc.)
>
> Cheers
> Oleg
>
> On Oct 16, 2015, at 8:56 AM, Christopher Wilson 
> wrote:
>
> Joe, it was an HDP issue.  I didn't leap to NiFi if the examples didn't
> work.  Thanks again.
>
> Also, if there's a better way to pump data into Hive I'm all ears.
>
> -Chris
>
> On Fri, Oct 16, 2015 at 8:53 AM, Christopher Wilson 
> wrote:
>
>> Joe, the first hurdle is to get ojdbc6.jar downloaded and installed in
>> /usr/share/java.  There's a link created in /usr/hdp/2.3.0.0-2557/hive/lib/
>> but points to nothing.
>>
>> Here's the hurdle I can't get past.  If you install and run kite-dataset
>> from the web site and run through the example with debug and verbose turned
>> on (below) you get the output below.  It thinks mapreduce.tar.gz doesn't
>> exist, but it does (way down below).  I've run this as users root and hdfs
>> with no joy.  Thanks for looking.
>>
>> debug=true ./kite-dataset -v csv-import sandwiches.csv sandwiches
>>
>> WARNING: Use "yarn jar" to launch YARN applications.
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/usr/hdp/2.3.0.0-2557/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/hdp/2.3.0.0-2557/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 1 job failure(s) occurred:
>> org.kitesdk.tools.CopyTask:
>> Kite(dataset:file:/tmp/0c1454eb-7831-4d6b-85a2-63a6cc8c51... ID=1 (1/1)(1):
>> java.io.FileNotFoundException: File
>> file:/hdp/apps/2.3.0.0-2557/mapreduce/mapreduce.tar.gz does not exist
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:819)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:596)
>> at
>> org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:110)
>> at
>> org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:467)
>> at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
>> at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2193)
>> at org.apache.hadoop.fs.FileContext$25.next(FileContext.java:2189)
>> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>> at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2189)
>> at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:601)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:457)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:142)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
>> at
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
>> at
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
>> at
>> org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
>> at
>> org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
>> at
>> org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
>> at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> [hdfs@sandbox ~]$ hdfs dfs -ls /hdp/apps/2.3.0.0-2557/mapreduce
>> Found 2 items
>> -r--r--r--   1 hdfs hadoop 105893 2015-08-20 08:36
>> /hdp/apps/2.3.0.0-2557/mapreduce/hadoop-streaming.jar
>> -r--r--r--   1 hdfs hadoop  207888607 2015-08-20 08:33
>> /hdp/apps/2.3.0.0-2557/mapreduce/mapreduce.tar.gz
>>
>>
>> On Thu, Oct 15, 2015 at 3:22 PM, Joe Witt  wrote:
>>
>>> Chris,
>>>
>>> Are you seeing errors in NiFi or in HDP?  If you're 

Re: StoreInKiteDataset help

2015-10-15 Thread Joe Witt
Chris,

Are you seeing errors in NiFi or in HDP?  If you're seeing errors in
NiFi can you please send us the logs?

Thanks
Joe

On Thu, Oct 15, 2015 at 3:02 PM, Christopher Wilson  wrote:
> Has anyone gotten Kite to work on HDP?  I'd wanted to do this very thing but
> am running into all kinds of issues with having .jar files not in the
> distributed cache (basically in /apps/hdp).
>
> Any feedback appreciated.
>
> -Chris
>
> On Sat, Sep 19, 2015 at 11:04 AM, Tyler Hawkes 
> wrote:
>>
>> Thanks for the link. I'm using
>> "dataset:hive://hadoop01:9083/default/sandwiches". hadoop01 has hive on it.
>>
>> On Fri, Sep 18, 2015 at 7:36 AM Jeff  wrote:
>>>
>>> Not sure if this is what you are looking for but it has a bit on kite.
>>>
>>> http://ingest.tips/2014/12/22/getting-started-with-apache-nifi/
>>>
>>> -cb
>>>
>>>
>>> On Sep 18, 2015, at 8:32 AM, Bryan Bende  wrote:
>>>
>>> Hi Tyler,
>>>
>>> Unfortunately I don't think there are any tutorials on this. Can you
>>> provide an example of the dataset uri you specified that is showing as
>>> invalid?
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>> On Fri, Sep 18, 2015 at 12:36 AM, Tyler Hawkes 
>>> wrote:

 I'm just getting going on NiFi and trying to write data to Hive either
 from Kafka or an RDBMS. After setting up the hadoop configuration files and
 a target dataset uri it says the uri is invalid. I'm wondering if there's a
 tutorial on getting kite set up with my version of hive (HDP 2.2 running
 hive 0.14) and nifi since I've been unable to find anything on google or on
 the mailing list archive and the documentation of StoreInKiteDataset it
 lacking a lot of detail.

 Any help on this would be greatly appreciated.
>>>
>>>
>>>
>