Lately, I’ve been experimenting with Kudu. It has been a much better experience 
than with HBase. Using it is much simpler, even from spark-shell.

spark-shell --packages org.apache.kudu:kudu-spark_2.10:1.0.0

It’s like going back to rudimentary DB systems where tables have just a primary 
key and the columns. Additional benefits include a home-grown spark package, 
fast upserts and table scans for analytics, time-series support just 
introduced, and (my favorite) simpler configuration and administration. It has 
just gone to version 1.0.0; so, I’m waiting for 1.0.1+ before I propose it as 
our HBase replacement for some bugs to shake out. All my performance tests have 
been stellar versus HBase especially with its simplicity.

Just a thought…

Cheers,
Ben


> On Oct 3, 2016, at 8:40 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> Hi,
> 
> I decided to create a composite key ticker-date from the csv file
> 
> I just did some manipulation on CSV file
> 
> export IFS=",";sed -i 1d tsco.csv; cat tsco.csv | while read a b c d e f; do 
> echo "TSCO-$a,TESCO PLC,TSCO,$a,$b,$c,$d,$e,$f"; done > temp; mv -f temp 
> tsco.csv
> 
> Which basically takes the csv file, tells the shell that field separator 
> IFS=",", drops the header, reads every field in every line (1,b,c ..), 
> creates the composite key TSCO-$a, adds the stock name and ticker to the csv 
> file. The whole process can be automated and parameterised.
> 
> Once the csv file is put into HDFS then, I run the following command
> 
> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
> -Dimporttsv.separator=',' 
> -Dimporttsv.columns="HBASE_ROW_KEY,stock_info:stock,stock_info:ticker,stock_daily:Date,stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
>  tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> 
> The Hbase table is created as below
> 
> create 'tsco','stock_info','stock_daily'
> 
> and this is the data (2 rows each 2 family and with 8 attributes)
> 
> hbase(main):132:0> scan 'tsco', LIMIT => 2
> ROW                                                    COLUMN+CELL
>  TSCO-1-Apr-08                                         
> column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-08
>  TSCO-1-Apr-08                                         
> column=stock_daily:close, timestamp=1475507091676, value=405.25
>  TSCO-1-Apr-08                                         
> column=stock_daily:high, timestamp=1475507091676, value=406.75
>  TSCO-1-Apr-08                                         
> column=stock_daily:low, timestamp=1475507091676, value=379.25
>  TSCO-1-Apr-08                                         
> column=stock_daily:open, timestamp=1475507091676, value=380.00
>  TSCO-1-Apr-08                                         
> column=stock_daily:volume, timestamp=1475507091676, value=49664486
>  TSCO-1-Apr-08                                         
> column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
>  TSCO-1-Apr-08                                         
> column=stock_info:ticker, timestamp=1475507091676, value=TSCO
>  
>  TSCO-1-Apr-09                                         
> column=stock_daily:Date, timestamp=1475507091676, value=1-Apr-09
>  TSCO-1-Apr-09                                         
> column=stock_daily:close, timestamp=1475507091676, value=333.30
>  TSCO-1-Apr-09                                         
> column=stock_daily:high, timestamp=1475507091676, value=334.60
>  TSCO-1-Apr-09                                         
> column=stock_daily:low, timestamp=1475507091676, value=326.50
>  TSCO-1-Apr-09                                         
> column=stock_daily:open, timestamp=1475507091676, value=331.10
>  TSCO-1-Apr-09                                         
> column=stock_daily:volume, timestamp=1475507091676, value=24877341
>  TSCO-1-Apr-09                                         
> column=stock_info:stock, timestamp=1475507091676, value=TESCO PLC
>  TSCO-1-Apr-09                                         
> column=stock_info:ticker, timestamp=1475507091676, value=TSCO
> 
> Any suggestions
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 14:42, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> or may be add ticker+date like similar
> 
> 
> <image.png>
> 
> So the new row key would be TSCO-1-Apr-08 
> 
> and this will be added as row key. Both Date and ticker will stay as they are 
> as column family attributes?
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 14:32, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> with ticker+date I can c reate something like below for row key
> 
> TSCO_1-Apr-08 
> 
> 
> or TSCO1-Apr-08
> 
> if I understood you correctly
>                     
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 13:13, ayan guha <guha.a...@gmail.com 
> <mailto:guha.a...@gmail.com>> wrote:
> Hi
> 
> Looks like you are saving to new.csv but still loading tsco.csv? Its 
> definitely the header.
> 
> Suggestion: ticker+date as row key has following benefits:
> 
> 1. using ticker+date as row key will enable you to hold multiple ticker in 
> this single hbase table. (Think composite primary key)
> 2. Using date itself as row key will lead to hotspots (Look up hotspoting due 
> to monotonically increasing row key). To distribute the load, it is suggested 
> to use a salting. Ticker can be used as a natural salt in this case. 
> 3. Also, you may want to hash the rowkey value to give it little more 
> flexible (Think surrogate key). 
> 
> 
> 
> On Mon, Oct 3, 2016 at 10:17 PM, Mich Talebzadeh <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> Hi Ayan,
> 
> Sounds like the row key has to be unique much like a primary key in RDBMS
> 
> This is what I download as a csv for stock from Google Finance
> 
>   Date        Open    High    Low     Close   Volume
> 27-Sep-16     177.4   177.75  172.5   177.75  24117196
> 
> 
> So What I do I add the stock and ticker myself to end of the row via shell 
> script and get rid of header
> 
> sed -i 1d tsco.csv; cat tsco.csv|awk '{print $0,",TESCO PLC,TSCO"}' > new.csv
> 
> The New table has two column families: stock_price, stock_info and row key 
> date (one row per date)
> 
> This creates a new csv file with two additional columns appended to the end 
> of each line
> 
> Then I run the following command
> 
> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv 
> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY, 
> stock_daily:open, stock_daily:high, stock_daily:low, stock_daily:close, 
> stock_daily:volume, stock_info:stock, stock_info:ticker" tsco 
> hdfs://rhes564:9000/data/stocks/tsco.csv
> 
> This is in Hbase table for a given day
> 
> hbase(main):090:0> scan 'tsco', LIMIT => 10
> ROW                                                    COLUMN+CELL
>  1-Apr-08                                              
> column=stock_daily:close, timestamp=1475492248665, value=405.25
>  1-Apr-08                                              
> column=stock_daily:high, timestamp=1475492248665, value=406.75
>  1-Apr-08                                              
> column=stock_daily:low, timestamp=1475492248665, value=379.25
>  1-Apr-08                                              
> column=stock_daily:open, timestamp=1475492248665, value=380.00
>  1-Apr-08                                              
> column=stock_daily:volume, timestamp=1475492248665, value=49664486
>  1-Apr-08                                              
> column=stock_info:stock, timestamp=1475492248665, value=TESCO PLC
>  1-Apr-08                                              
> column=stock_info:ticker, timestamp=1475492248665, value=TSCO
> 
>   
> But I also have this at the bottom
> 
>   Date                                                  
> column=stock_daily:close, timestamp=1475491189158, value=Close
>  Date                                                  
> column=stock_daily:high, timestamp=1475491189158, value=High
>  Date                                                  
> column=stock_daily:low, timestamp=1475491189158, value=Low
>  Date                                                  
> column=stock_daily:open, timestamp=1475491189158, value=Open
>  Date                                                  
> column=stock_daily:volume, timestamp=1475491189158, value=Volume
>  Date                                                  
> column=stock_info:stock, timestamp=1475491189158, value=TESCO PLC
>  Date                                                  
> column=stock_info:ticker, timestamp=1475491189158, value=TSCO
> 
> Sounds like the table header?
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 11:24, ayan guha <guha.a...@gmail.com 
> <mailto:guha.a...@gmail.com>> wrote:
> I am not well versed with importtsv, but you can create a CSV file using a 
> simple spark program to create first column as ticker+tradedate. I remember 
> doing similar manipulation to create row key format in pig.
> 
> On 3 Oct 2016 20:40, "Mich Talebzadeh" <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> Thanks Ayan,
> 
> How do you specify ticker+rtrade as row key in the below
> 
> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' 
> -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker, 
> stock_daily:tradedate, 
> stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
>  tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> 
> I always thought that Hbase will take the first column as row key so it takes 
> stock as the row key which is tsco plc for every row!
> 
> Does row key need to be unique?
> 
> cheers
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 10:30, ayan guha <guha.a...@gmail.com 
> <mailto:guha.a...@gmail.com>> wrote:
> Hi Mitch
> 
> It is more to do with hbase than spark.
> 
> Row key can be anything, yes but essentially what you are doing is insert and 
> update tesco PLC row. Given your schema, ticker+trade date seems to be a good 
> row key
> 
> On 3 Oct 2016 18:25, "Mich Talebzadeh" <mich.talebza...@gmail.com 
> <mailto:mich.talebza...@gmail.com>> wrote:
> thanks again.
> 
> I added that jar file to the classpath and that part worked.
> 
> I was using spark shell so I have to use spark-submit for it to be able to 
> interact with map-reduce job.
> 
> BTW when I use the command line utility ImportTsv  to load a file into Hbase 
> with the following table format
> 
> describe 'marketDataHbase'
> Table marketDataHbase is ENABLED
> marketDataHbase
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 
> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 
> 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKC
> ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
> 1 row(s) in 0.0930 seconds
> 
> 
> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator=',' 
> -Dimporttsv.columns="HBASE_ROW_KEY, stock_daily:ticker, 
> stock_daily:tradedate, 
> stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
>  tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> 
> There are with 1200 rows in the csv file, but it only loads the first row!
> 
> scan 'tsco'
> ROW                                                    COLUMN+CELL
>  Tesco PLC                                             
> column=stock_daily:close, timestamp=1475447365118, value=325.25
>  Tesco PLC                                             
> column=stock_daily:high, timestamp=1475447365118, value=332.00
>  Tesco PLC                                             
> column=stock_daily:low, timestamp=1475447365118, value=324.00
>  Tesco PLC                                             
> column=stock_daily:open, timestamp=1475447365118, value=331.75
>  Tesco PLC                                             
> column=stock_daily:ticker, timestamp=1475447365118, value=TSCO
>  Tesco PLC                                             
> column=stock_daily:tradedate, timestamp=1475447365118, value= 3-Jan-06
>  Tesco PLC                                             
> column=stock_daily:volume, timestamp=1475447365118, value=46935045
> 1 row(s) in 0.0390 seconds
> 
> Is this because the hbase_row_key --> Tesco PLC is the same for all? I 
> thought that the row key can be anything.
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 3 October 2016 at 07:44, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> We installed Apache Spark 1.6.0 at the time alongside CDH 5.4.8 because 
> Cloudera only had Spark 1.3.0 at the time, and we wanted to use Spark 1.6.0’s 
> features. We borrowed the /etc/spark/conf/spark-env.sh file that Cloudera 
> generated because it was customized to add jars first from paths listed in 
> the file /etc/spark/conf/classpath.txt. So, we entered the path for the 
> htrace jar into the /etc/spark/conf/classpath.txt file. Then, it worked. We 
> could read/write to HBase. 
> 
>> On Oct 2, 2016, at 12:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> Thanks Ben
>> 
>> The thing is I am using Spark 2 and no stack from CDH!
>> 
>> Is this approach to reading/writing to Hbase specific to Cloudera?
>> 
>> 
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> On 1 October 2016 at 23:39, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Mich,
>> 
>> I know up until CDH 5.4 we had to add the HTrace jar to the classpath to 
>> make it work using the command below. But after upgrading to CDH 5.7, it 
>> became unnecessary.
>> 
>> echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar" >> 
>> /etc/spark/conf/classpath.txt
>> 
>> Hope this helps.
>> 
>> Cheers,
>> Ben
>> 
>> 
>>> On Oct 1, 2016, at 3:22 PM, Mich Talebzadeh <mich.talebza...@gmail.com 
>>> <mailto:mich.talebza...@gmail.com>> wrote:
>>> 
>>> Trying bulk load using Hfiles in Spark as below example:
>>> 
>>> import org.apache.spark._
>>> import org.apache.spark.rdd.NewHadoopRDD
>>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
>>> import org.apache.hadoop.hbase.client.HBaseAdmin
>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>> import org.apache.hadoop.fs.Path;
>>> import org.apache.hadoop.hbase.HColumnDescriptor
>>> import org.apache.hadoop.hbase.util.Bytes
>>> import org.apache.hadoop.hbase.client.Put;
>>> import org.apache.hadoop.hbase.client.HTable;
>>> import org.apache.hadoop.hbase.mapred.TableOutputFormat
>>> import org.apache.hadoop.mapred.JobConf
>>> import org.apache.hadoop.hbase.io 
>>> <http://org.apache.hadoop.hbase.io/>.ImmutableBytesWritable
>>> import org.apache.hadoop.mapreduce.Jo 
>>> <http://org.apache.hadoop.mapreduce.jo/>b
>>> import org.apache.hadoop.mapreduce.li 
>>> <http://org.apache.hadoop.mapreduce.li/>b.input.FileInputFormat
>>> import org.apache.hadoop.mapreduce.li 
>>> <http://org.apache.hadoop.mapreduce.li/>b.output.FileOutputFormat
>>> import org.apache.hadoop.hbase.KeyValue
>>> import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat
>>> import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
>>> 
>>> So far no issues.
>>> 
>>> Then I do
>>> 
>>> val conf = HBaseConfiguration.create()
>>> conf: org.apache.hadoop.conf.Configuration = Configuration: 
>>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, 
>>> yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml
>>> val tableName = "testTable"
>>> tableName: String = testTable
>>> 
>>> But this one fails:
>>> 
>>> scala> val table = new HTable(conf, tableName)
>>> java.io.IOException: java.lang.reflect.InvocationTargetException
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:240)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:431)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionManager.createConnection(ConnectionManager.java:424)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:302)
>>>   at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
>>>   at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
>>>   ... 52 elided
>>> Caused by: java.lang.reflect.InvocationTargetException: 
>>> java.lang.NoClassDefFoundError: org/apache/htrace/Trace
>>>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>>>   at 
>>> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>>>   at 
>>> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>>>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:238)
>>>   ... 57 more
>>> Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
>>>   at 
>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:216)
>>>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:419)
>>>   at 
>>> org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:65)
>>>   at 
>>> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:105)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.retrieveClusterId(ConnectionManager.java:905)
>>>   at 
>>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.<init>(ConnectionManager.java:648)
>>>   ... 62 more
>>> Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
>>> 
>>> I have got all the jar files in spark-defaults.conf
>>> 
>>> spark.driver.extraClassPath      
>>> /home/hduser/jars/ojdbc6.jar:/home/hduser/jars/jconn4.jar:/home/hduser/jars/hbase-client-1.2.3.jar:/home/hduser/jars/hbase-server-1.2.3.jar:/home/hduser/jars/hbase-common-1.2.3.jar:/home/hduser/jars/hbase-protocol-1.2.3.jar:/home/hduser/jars/htrace-core-3.0.4.jar:/home/hduser/jars/hive-hbase-handler-2.1.0.jar
>>> spark.executor.extraClassPath    
>>> /home/hduser/jars/ojdbc6.jar:/home/hduser/jars/jconn4.jar:/home/hduser/jars/hbase-client-1.2.3.jar:/home/hduser/jars/hbase-server-1.2.3.jar:/home/hduser/jars/hbase-common-1.2.3.jar:/home/hduser/jars/hbase-protocol-1.2.3.jar:/home/hduser/jars/htrace-core-3.0.4.jar:/home/hduser/jars/hive-hbase-handler-2.1.0.jar
>>> 
>>> 
>>> and also in Spark shell where I test the code
>>> 
>>>  --jars 
>>> /home/hduser/jars/hbase-client-1.2.3.jar,/home/hduser/jars/hbase-server-1.2.3.jar,/home/hduser/jars/hbase-common-1.2.3.jar,/home/hduser/jars/hbase-protocol-1.2.3.jar,/home/hduser/jars/htrace-core-3.0.4.jar,/home/hduser/jars/hive-hbase-handler-2.1.0.jar'
>>> 
>>> So any ideas will be appreciated.
>>> 
>>> Thanks
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>  
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>> 
>>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>>> loss, damage or destruction of data or any other property which may arise 
>>> from relying on this email's technical content is explicitly disclaimed. 
>>> The author will in no case be liable for any monetary damages arising from 
>>> such loss, damage or destruction.
>>>  
>> 
>> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Best Regards,
> Ayan Guha
> 
> 
> 

Reply via email to