I am not well versed with importtsv, but you can create a CSV file using a
simple spark program to create first column as ticker+tradedate. I remember
doing similar manipulation to create row key format in pig.
On 3 Oct 2016 20:40, "Mich Talebzadeh" <mich.talebza...@gmail.com> wrote:

> Thanks Ayan,
>
> How do you specify ticker+rtrade as row key in the below
>
> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
> stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:
> high,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
> hdfs://rhes564:9000/data/stocks/tsco.csv
>
> I always thought that Hbase will take the first column as row key so it
> takes stock as the row key which is tsco plc for every row!
>
> Does row key need to be unique?
>
> cheers
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 3 October 2016 at 10:30, ayan guha <guha.a...@gmail.com> wrote:
>
>> Hi Mitch
>>
>> It is more to do with hbase than spark.
>>
>> Row key can be anything, yes but essentially what you are doing is insert
>> and update tesco PLC row. Given your schema, ticker+trade date seems to be
>> a good row key
>> On 3 Oct 2016 18:25, "Mich Talebzadeh" <mich.talebza...@gmail.com> wrote:
>>
>>> thanks again.
>>>
>>> I added that jar file to the classpath and that part worked.
>>>
>>> I was using spark shell so I have to use spark-submit for it to be able
>>> to interact with map-reduce job.
>>>
>>> BTW when I use the command line utility ImportTsv  to load a file into
>>> Hbase with the following table format
>>>
>>> describe 'marketDataHbase'
>>> Table marketDataHbase is ENABLED
>>> marketDataHbase
>>> COLUMN FAMILIES DESCRIPTION
>>> {NAME => 'price_info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY
>>> => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE',
>>> TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKC
>>> ACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>>> 1 row(s) in 0.0930 seconds
>>>
>>>
>>> hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
>>> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
>>> stock_daily:ticker, stock_daily:tradedate, stock_daily:open,stock_daily:h
>>> igh,stock_daily:low,stock_daily:close,stock_daily:volume" tsco
>>> hdfs://rhes564:9000/data/stocks/tsco.csv
>>>
>>> There are with 1200 rows in the csv file,* but it only loads the first
>>> row!*
>>>
>>> scan 'tsco'
>>> ROW                                                    COLUMN+CELL
>>>  Tesco PLC
>>> column=stock_daily:close, timestamp=1475447365118, value=325.25
>>>  Tesco PLC
>>> column=stock_daily:high, timestamp=1475447365118, value=332.00
>>>  Tesco PLC
>>> column=stock_daily:low, timestamp=1475447365118, value=324.00
>>>  Tesco PLC
>>> column=stock_daily:open, timestamp=1475447365118, value=331.75
>>>  Tesco PLC
>>> column=stock_daily:ticker, timestamp=1475447365118, value=TSCO
>>>  Tesco PLC
>>> column=stock_daily:tradedate, timestamp=1475447365118, value= 3-Jan-06
>>>  Tesco PLC
>>> column=stock_daily:volume, timestamp=1475447365118, value=46935045
>>> 1 row(s) in 0.0390 seconds
>>>
>>> Is this because the hbase_row_key --> Tesco PLC is the same for all? I
>>> thought that the row key can be anything.
>>>
>>>
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 3 October 2016 at 07:44, Benjamin Kim <bbuil...@gmail.com> wrote:
>>>
>>>> We installed Apache Spark 1.6.0 at the time alongside CDH 5.4.8 because
>>>> Cloudera only had Spark 1.3.0 at the time, and we wanted to use Spark
>>>> 1.6.0’s features. We borrowed the /etc/spark/conf/spark-env.sh file that
>>>> Cloudera generated because it was customized to add jars first from paths
>>>> listed in the file /etc/spark/conf/classpath.txt. So, we entered the path
>>>> for the htrace jar into the /etc/spark/conf/classpath.txt file. Then, it
>>>> worked. We could read/write to HBase.
>>>>
>>>> On Oct 2, 2016, at 12:52 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>> wrote:
>>>>
>>>> Thanks Ben
>>>>
>>>> The thing is I am using Spark 2 and no stack from CDH!
>>>>
>>>> Is this approach to reading/writing to Hbase specific to Cloudera?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>> On 1 October 2016 at 23:39, Benjamin Kim <bbuil...@gmail.com> wrote:
>>>>
>>>>> Mich,
>>>>>
>>>>> I know up until CDH 5.4 we had to add the HTrace jar to the classpath
>>>>> to make it work using the command below. But after upgrading to CDH 5.7, 
>>>>> it
>>>>> became unnecessary.
>>>>>
>>>>> echo "/opt/cloudera/parcels/CDH/jars/htrace-core-3.2.0-incubating.jar"
>>>>> >> /etc/spark/conf/classpath.txt
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Cheers,
>>>>> Ben
>>>>>
>>>>>
>>>>> On Oct 1, 2016, at 3:22 PM, Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Trying bulk load using Hfiles in Spark as below example:
>>>>>
>>>>> import org.apache.spark._
>>>>> import org.apache.spark.rdd.NewHadoopRDD
>>>>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
>>>>> import org.apache.hadoop.hbase.client.HBaseAdmin
>>>>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>>>> import org.apache.hadoop.fs.Path;
>>>>> import org.apache.hadoop.hbase.HColumnDescriptor
>>>>> import org.apache.hadoop.hbase.util.Bytes
>>>>> import org.apache.hadoop.hbase.client.Put;
>>>>> import org.apache.hadoop.hbase.client.HTable;
>>>>> import org.apache.hadoop.hbase.mapred.TableOutputFormat
>>>>> import org.apache.hadoop.mapred.JobConf
>>>>> import org.apache.hadoop.hbase.io.ImmutableBytesWritable
>>>>> import org.apache.hadoop.mapreduce.Job
>>>>> import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
>>>>> import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat
>>>>> import org.apache.hadoop.hbase.KeyValue
>>>>> import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat
>>>>> import org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
>>>>>
>>>>> So far no issues.
>>>>>
>>>>> Then I do
>>>>>
>>>>> val conf = HBaseConfiguration.create()
>>>>> conf: org.apache.hadoop.conf.Configuration = Configuration:
>>>>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
>>>>> yarn-default.xml, yarn-site.xml, hbase-default.xml, hbase-site.xml
>>>>> val tableName = "testTable"
>>>>> tableName: String = testTable
>>>>>
>>>>> But this one fails:
>>>>>
>>>>> scala> val table = new HTable(conf, tableName)
>>>>> java.io.IOException: java.lang.reflect.InvocationTargetException
>>>>>   at org.apache.hadoop.hbase.client.ConnectionFactory.createConne
>>>>> ction(ConnectionFactory.java:240)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionManager.createConne
>>>>> ction(ConnectionManager.java:431)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionManager.createConne
>>>>> ction(ConnectionManager.java:424)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionManager.getConnecti
>>>>> onInternal(ConnectionManager.java:302)
>>>>>   at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:185)
>>>>>   at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:151)
>>>>>   ... 52 elided
>>>>> Caused by: java.lang.reflect.InvocationTargetException:
>>>>> java.lang.NoClassDefFoundError: org/apache/htrace/Trace
>>>>>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>>>>> Method)
>>>>>   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>>>>> ConstructorAccessorImpl.java:62)
>>>>>   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>>>>> legatingConstructorAccessorImpl.java:45)
>>>>>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionFactory.createConne
>>>>> ction(ConnectionFactory.java:238)
>>>>>   ... 57 more
>>>>> Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/Trace
>>>>>   at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exist
>>>>> s(RecoverableZooKeeper.java:216)
>>>>>   at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.
>>>>> java:419)
>>>>>   at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZ
>>>>> Node(ZKClusterId.java:65)
>>>>>   at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterI
>>>>> d(ZooKeeperRegistry.java:105)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionManager$HConnection
>>>>> Implementation.retrieveClusterId(ConnectionManager.java:905)
>>>>>   at org.apache.hadoop.hbase.client.ConnectionManager$HConnection
>>>>> Implementation.<init>(ConnectionManager.java:648)
>>>>>   ... 62 more
>>>>> Caused by: java.lang.ClassNotFoundException: org.apache.htrace.Trace
>>>>>
>>>>> I have got all the jar files in spark-defaults.conf
>>>>>
>>>>> spark.driver.extraClassPath      /home/hduser/jars/ojdbc6.jar:/
>>>>> home/hduser/jars/jconn4.jar:/home/hduser/jars/hbase-client-1
>>>>> .2.3.jar:/home/hduser/jars/hbase-server-1.2.3.jar:/home/hdus
>>>>> er/jars/hbase-common-1.2.3.jar:/home/hduser/jars/hbase-proto
>>>>> col-1.2.3.jar:/home/hduser/jars/htrace-core-3.0.4.jar:/home/
>>>>> hduser/jars/hive-hbase-handler-2.1.0.jar
>>>>> spark.executor.extraClassPath    /home/hduser/jars/ojdbc6.jar:/
>>>>> home/hduser/jars/jconn4.jar:/home/hduser/jars/hbase-client-1
>>>>> .2.3.jar:/home/hduser/jars/hbase-server-1.2.3.jar:/home/hdus
>>>>> er/jars/hbase-common-1.2.3.jar:/home/hduser/jars/hbase-proto
>>>>> col-1.2.3.jar:/home/hduser/jars/htrace-core-3.0.4.jar:/home/
>>>>> hduser/jars/hive-hbase-handler-2.1.0.jar
>>>>>
>>>>>
>>>>> and also in Spark shell where I test the code
>>>>>
>>>>>  --jars /home/hduser/jars/hbase-client-1.2.3.jar,/home/hduser/jars/h
>>>>> base-server-1.2.3.jar,/home/hduser/jars/hbase-common-1.2.3.j
>>>>> ar,/home/hduser/jars/hbase-protocol-1.2.3.jar,/home/hduser/j
>>>>> ars/htrace-core-3.0.4.jar,/home/hduser/jars/hive-hbase-handl
>>>>> er-2.1.0.jar'
>>>>>
>>>>> So any ideas will be appreciated.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>

Reply via email to