Re: Problem Updating Stats

2016-03-19 Thread Benjamin Kim
Ankit,

I did not see any problems when connecting with the phoenix sqlline client. So, 
below is the what you asked for. I hope that you can give us insight into 
fixing this.

hbase(main):005:0> describe 'SYSTEM.STATS'
Table SYSTEM.STATS is ENABLED   

 
SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 => 
'|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', coprocessor$2 
=> '|org.apache.phoenix.coprocessor.UngroupedAggr
egateRegionObserver|805306366|', coprocessor$3 => 
'|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
erCachingEndpointImpl|805306366|', coprocessor$5 => 
'|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|', 
coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' => 
'org.apache.phoenix.schema.MetaDataSplitPolicy'}}   
   
COLUMN FAMILIES DESCRIPTION 

 
{NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', 
REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS 
=> '0', TTL => 'FOREVER', KEEP
_DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
BLOCKCACHE => 'true'}   
   
1 row(s) in 0.0280 seconds

Thanks,
Ben


> On Mar 15, 2016, at 11:59 PM, Ankit Singhal <ankitsingha...@gmail.com> wrote:
> 
> Yes it seems to. 
> Did you get any error related to SYSTEM.STATS when the client is connected 
> first time ?
> 
> can you please describe your system.stats table and paste the output here.
> 
> On Wed, Mar 16, 2016 at 3:24 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> When trying to run update status on an existing table in hbase, I get error:
> Update stats:
> UPDATE STATISTICS "ops_csv" ALL
> error:
> ERROR 504 (42703): Undefined column. columnName=REGION_NAME
> Looks like the meta data information is messed up, ie. there is no column 
> with name REGION_NAME in this table.
> I see similar errors for other tables that we currently have in hbase.
> 
> We are using CDH 5.5.2, HBase 1.0.0, and Phoenix 4.5.2.
> 
> Thanks,
> Ben
> 



Re: Problem Updating Stats

2016-03-19 Thread Benjamin Kim
I got it to work by uninstalling Phoenix and reinstalling it again. I had to 
wipe clean all components.

Thanks,
Ben

> On Mar 16, 2016, at 10:47 AM, Ankit Singhal <ankitsingha...@gmail.com> wrote:
> 
> It seems from the attached logs that you have upgraded phoenix to 4.7 version 
> and now you are using old client to connect with it.
> "Update statistics" command and guideposts will not work with old client 
> after upgradation to 4.7, you need to use the new client for such operations.
> 
> On Wed, Mar 16, 2016 at 10:55 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME  
> |   |
> +-+--+-++-+
> |  | SYSTEM| STATS  | 
> PHYSICAL_NAME| 12  |
> |  | SYSTEM| STATS  | 
> COLUMN_FAMILY| 12  |
> |  | SYSTEM| STATS  | 
> GUIDE_POST_KEY  | -3   |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_WIDTH   | -5   |
> |  | SYSTEM| STATS  | 
> LAST_STATS_UPDATE_TIME | 91  |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_ROW_COUNT   | -5   |
> 
> I have attached the SYSTEM.CATALOG contents.
> 
> Thanks,
> Ben
> 
> 
> 
>> On Mar 16, 2016, at 9:34 AM, Ankit Singhal <ankitsingha...@gmail.com 
>> <mailto:ankitsingha...@gmail.com>> wrote:
>> 
>> Sorry Ben, I may not be clear in first comment but I need you to describe 
>> SYSTEM.STATS in some sql client so that I can see the columns present.
>> And also please scan 'SYSTEM.CATALOG' ,{RAW=>true} in hbase shell and attach 
>> a output here.
>> 
>> On Wed, Mar 16, 2016 at 8:55 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Ankit,
>> 
>> I did not see any problems when connecting with the phoenix sqlline client. 
>> So, below is the what you asked for. I hope that you can give us insight 
>> into fixing this.
>> 
>> hbase(main):005:0> describe 'SYSTEM.STATS'
>> Table SYSTEM.STATS is ENABLED
>>  
>>
>> SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 => 
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', 
>> coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggr
>> egateRegionObserver|805306366|', coprocessor$3 => 
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
>> coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
>> erCachingEndpointImpl|805306366|', coprocessor$5 => 
>> '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|', 
>> coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
>> er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' => 
>> 'org.apache.phoenix.schema.MetaDataSplitPolicy'}}
>>   
>> COLUMN FAMILIES DESCRIPTION  
>>  
>>
>> {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', 
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', 
>> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
>> _DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
>> BLOCKCACHE => 'true'}
>>       
>> 1 row(s) in 0.0280 seconds
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Mar 15, 2016, at 11:59 PM, Ankit Singhal <ankitsingha...@gmail.com 
>>> <mailto:ankitsingha...@gmail.com>> wrote:
>>> 
>>> Yes it seems to. 
>>> Did you get any error related to SYSTEM.STATS when the client is connected 
>>> first time ?
>>> 
>>> can you please describe your system.stats table and paste the output here.
>>> 
>>> On Wed, Mar 16, 2016 at 3:24 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> When trying to run update status on an existing table in hbase, I get error:
>>> Update stats:
>>> UPDATE STATISTICS "ops_csv" ALL
>>> error:
>>> ERROR 504 (42703): Undefined column. columnName=REGION_NAME
>>> Looks like the meta data information is messed up, ie. there is no column 
>>> with name REGION_NAME in this table.
>>> I see similar errors for other tables that we currently have in hbase.
>>> 
>>> We are using CDH 5.5.2, HBase 1.0.0, and Phoenix 4.5.2.
>>> 
>>> Thanks,
>>> Ben
>>> 
>> 
>> 
> 
> 
> 



Re: Problem Updating Stats

2016-03-18 Thread Benjamin Kim
Ankit,

We tried a 4.7 client upgrade to use the phoenix spark client as an experiment; 
then, rolled back to the sanctioned CDH 5.5 version of 4.5. I had no idea that 
someone did an "update stats” during that period, and I didn’t know that there 
would be a fundamental change as this. Do you know of a way to rollback this 
change too?

Thanks,
Ben 


> On Mar 16, 2016, at 10:47 AM, Ankit Singhal <ankitsingha...@gmail.com> wrote:
> 
> It seems from the attached logs that you have upgraded phoenix to 4.7 version 
> and now you are using old client to connect with it.
> "Update statistics" command and guideposts will not work with old client 
> after upgradation to 4.7, you need to use the new client for such operations.
> 
> On Wed, Mar 16, 2016 at 10:55 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> | TABLE_CAT | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME  
> |   |
> +-+--+-++-+
> |  | SYSTEM| STATS  | 
> PHYSICAL_NAME| 12  |
> |  | SYSTEM| STATS  | 
> COLUMN_FAMILY| 12  |
> |  | SYSTEM| STATS  | 
> GUIDE_POST_KEY  | -3   |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_WIDTH   | -5   |
> |  | SYSTEM| STATS  | 
> LAST_STATS_UPDATE_TIME | 91  |
> |  | SYSTEM| STATS  | 
> GUIDE_POSTS_ROW_COUNT   | -5   |
> 
> I have attached the SYSTEM.CATALOG contents.
> 
> Thanks,
> Ben
> 
> 
> 
>> On Mar 16, 2016, at 9:34 AM, Ankit Singhal <ankitsingha...@gmail.com 
>> <mailto:ankitsingha...@gmail.com>> wrote:
>> 
>> Sorry Ben, I may not be clear in first comment but I need you to describe 
>> SYSTEM.STATS in some sql client so that I can see the columns present.
>> And also please scan 'SYSTEM.CATALOG' ,{RAW=>true} in hbase shell and attach 
>> a output here.
>> 
>> On Wed, Mar 16, 2016 at 8:55 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Ankit,
>> 
>> I did not see any problems when connecting with the phoenix sqlline client. 
>> So, below is the what you asked for. I hope that you can give us insight 
>> into fixing this.
>> 
>> hbase(main):005:0> describe 'SYSTEM.STATS'
>> Table SYSTEM.STATS is ENABLED
>>  
>>
>> SYSTEM.STATS, {TABLE_ATTRIBUTES => {coprocessor$1 => 
>> '|org.apache.phoenix.coprocessor.ScanRegionObserver|805306366|', 
>> coprocessor$2 => '|org.apache.phoenix.coprocessor.UngroupedAggr
>> egateRegionObserver|805306366|', coprocessor$3 => 
>> '|org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver|805306366|', 
>> coprocessor$4 => '|org.apache.phoenix.coprocessor.Serv
>> erCachingEndpointImpl|805306366|', coprocessor$5 => 
>> '|org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint|805306366|', 
>> coprocessor$6 => '|org.apache.hadoop.hbase.regionserv
>> er.LocalIndexSplitter|805306366|', METADATA => {'SPLIT_POLICY' => 
>> 'org.apache.phoenix.schema.MetaDataSplitPolicy'}}
>>   
>> COLUMN FAMILIES DESCRIPTION  
>>  
>>
>> {NAME => '0', DATA_BLOCK_ENCODING => 'FAST_DIFF', BLOOMFILTER => 'ROW', 
>> REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', 
>> MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP
>> _DELETED_CELLS => 'true', BLOCKSIZE => '65536', IN_MEMORY => 'false', 
>> BLOCKCACHE => 'true'}
>>       
>> 1 row(s) in 0.0280 seconds
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Mar 15, 2016, at 11:59 PM, Ankit Singhal <ankitsingha...@gmail.com 
>>> <mailto:ankitsingha...@gmail.com>> wrote:
>>> 
>>> Yes it seems to. 
>>> Did you get any error related to SYSTEM.STATS when the client is connected 
>>> first time ?
>>> 
>&

Re: HBase Interpreter

2016-03-15 Thread Benjamin Kim
Any updates regarding this?

> On Feb 23, 2016, at 8:43 PM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> 
> Hi Ben
> 
> 
> Not yet - I made the change but unfortunately it's not working. Have not had 
> the chance to debug through the HBase ruby code yet. I should have some time 
> next week.
> 
> 
> _________
> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
> Sent: Tuesday, February 23, 2016 6:19 PM
> Subject: Re: HBase Interpreter
> To: <users@zeppelin.incubator.apache.org 
> <mailto:users@zeppelin.incubator.apache.org>>
> 
> 
> Hi Felix,
> 
> Any updates? Does the latest merged master have the hbase quorum properties?
> 
> Thanks,
> Ben
> 
> 
> On Feb 12, 2016, at 1:29 AM, Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> Cool, I think I have figured out how to set properties too. I might open a PR 
> tomorrow or later. 
> 
> 
> 
> 
> 
> On Thu, Feb 11, 2016 at 9:24 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote: 
> 
> Hi,
> I'll take a look over the weekend. Sorry for the delay in replying. 
> 
> On Wed, Feb 10, 2016 at 6:44 AM Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote: 
> It looks like hbase-site.xml is not picked up somehow.
> 
> Rajat would you know of a way to get that set with the ruby code? 
> 
> 
> _ 
> From: Benjamin Kim < bbuil...@gmail.com <mailto:bbuil...@gmail.com>> 
> Sent: Tuesday, February 9, 2016 2:58 PM
> 
> Subject: Re: HBase Interpreter 
> To: < users@zeppelin.incubator.apache.org 
> <mailto:users@zeppelin.incubator.apache.org>> 
> 
> 
> It looks like it’s not reaching the zookeeper quorum.
> 
> 16/02/09 21:52:19 ERROR client.ConnectionManager$HConnectionImplementation: 
> Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase
> 
> And the setting is:
> 
> quorum=localhost:2181
> 
> The HBase quorum is actually namenode001, namenode002, hbase-master001. Where 
> do I set this?
> 
> Thanks,
> Ben
> 
> 
> On Feb 4, 2016, at 9:15 PM, Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> We could probably look into HBase/Pom.xml handling the vendor-repo profile 
> too.
> 
> 
> 
> 
> 
> On Thu, Feb 4, 2016 at 8:08 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote: 
> 
> Benjamin,
> Can you try compiling Zeppelin by changing the dependencies in hbase/pom.xml 
> to use cloudera jars ? 
> In the long run, one option is to
> 1. run & capture o/p of 'bin/hbase classpath'
> 2. create a classloader
> 3. load all the classes from 1
> 
> Then it will work with any version of HBase theoritically.
>  
> 
> On Fri, Feb 5, 2016 at 8:14 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> Felix,
> 
> I know that Cloudera practice. We hate that they do that without informing 
> anyone.
> 
> Thanks,
> Ben
> 
> 
> 
> On Feb 4, 2016, at 9:18 AM, Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> CDH is known to cherry pick patches from later releases. Maybe it is because 
> of that.
> 
> Rajat do you have any lead on the release compatibility issue?
> 
> 
> _ 
> From: Rajat Venkatesh < rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> 
> Sent: Wednesday, February 3, 2016 10:05 PM 
> Subject: Re: HBase Interpreter 
> To: < users@zeppelin.incubator.apache.org 
> <mailto:users@zeppelin.incubator.apache.org>> 
> 
> 
> Oh. That should work. I've tested with 1.0.0. Hmm
> 
> On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> Hi Rajat,
> 
> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if 
> they are compatible?
> 
> Thanks,
> Ben
> 
> 
> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh < rvenkat...@qubole.com 
> <mailto:rvenkat...@qubole.com>> wrote:
> 
> Can you check the version of HBase ? HBase interpreter has been tested with 
> HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due to 
> mismatch in versions. 
> 
> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> I got this error below trying o

Re: Data Export

2016-03-15 Thread Benjamin Kim
Any updates as to the progress of this issue?

> On Feb 26, 2016, at 6:16 PM, Khalid Huseynov <khalid...@nflabs.com> wrote:
> 
> As far as I know there're few PRs (#6 
> <https://github.com/apache/incubator-zeppelin/pull/6>, #725 
> <https://github.com/apache/incubator-zeppelin/pull/725>, #89 
> <https://github.com/apache/incubator-zeppelin/pull/89>, #714 
> <https://github.com/apache/incubator-zeppelin/pull/714>) addressing similar 
> issue with some variation in approaches. They're being compared and probably 
> some resolution should be reached. You can also take a look and express your 
> opinion. Community may let us know if I'm missing something.
> 
> Best,
> Khalid
> 
> On Sat, Feb 27, 2016 at 2:23 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> I don’t know if I’m missing something, but is there a way to export the 
> result data into a CSV, Excel, etc. from a SQL statement?
> 
> Thanks,
> Ben
> 
> 



Re: S3 Zip File Loading Advice

2016-03-15 Thread Benjamin Kim
Hi Xinh,

I tried to wrap it, but it still didn’t work. I got a 
"java.util.ConcurrentModificationException”.

All,

I have been trying and trying with some help of a coworker, but it’s slow 
going. I have been able to gather a list of the s3 files I need to download.

### S3 Lists ###
import scala.collection.JavaConverters._
import java.util.ArrayList
import java.util.zip.{ZipEntry, ZipInputStream}
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary, 
ListObjectsRequest, GetObjectRequest}
import org.apache.commons.io.IOUtils
import org.joda.time.{DateTime, Period}
import org.joda.time.format.DateTimeFormat

val s3Bucket = "amg-events"

val formatter = DateTimeFormat.forPattern("/MM/dd/HH")
var then = DateTime.now()

var files = new ArrayList[String]

//S3 Client and List Object Request
val s3Client = new AmazonS3Client()
val listObjectsRequest = new ListObjectsRequest()
var objectListing: ObjectListing = null

//Your S3 Bucket
listObjectsRequest.setBucketName(s3Bucket)

var now = DateTime.now()
var range = 
Iterator.iterate(now.minusDays(1))(_.plus(Period.hours(1))).takeWhile(!_.isAfter(now))
range.foreach(ymdh => {
  //Your Folder path or Prefix
  listObjectsRequest.setPrefix(formatter.print(ymdh))

  //Adding s3:// to the paths and adding to a list
  do {
objectListing = s3Client.listObjects(listObjectsRequest);
for (objectSummary <- objectListing.getObjectSummaries().asScala) {
  if (objectSummary.getKey().contains(".csv.zip") && 
objectSummary.getLastModified().after(then.toDate())) {
//files.add(objectSummary.getKey())
files.add("s3n://" + s3Bucket + "/" + objectSummary.getKey())
  }
}
listObjectsRequest.setMarker(objectListing.getNextMarker())
  } while (objectListing.isTruncated())
})
then = now

//Creating a Scala List for same
val fileList = files.asScala

//Parallelize the Scala List
val fileRDD = sc.parallelize(fileList)

Now, I am trying to go through the list and download each file, unzip each file 
as it comes, and pass the ZipInputStream to the CSV parser. This is where I get 
stuck.

var df: DataFrame = null
for (file <- fileList) {
  val zipfile = s3Client.getObject(new GetObjectRequest(s3Bucket, 
file)).getObjectContent()
  val zis = new ZipInputStream(zipfile)
  var ze = zis.getNextEntry()
//  val fileDf = 
sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load(zis)
//  if (df != null) {
//df = df.unionAll(fileDf)
//  } else {
//df = fileDf
//  }
}

I don’t know if I am doing it right or not. I also read that parallelizing 
fileList would allow parallel file retrieval. But, I don’t know how to proceed 
from here.

If you can help, I would be grateful.

Thanks,
Ben


> On Mar 9, 2016, at 10:10 AM, Xinh Huynh <xinh.hu...@gmail.com> wrote:
> 
> Could you wrap the ZipInputStream in a List, since a subtype of 
> TraversableOnce[?] is required?
> 
> case (name, content) => List(new ZipInputStream(content.open))
> 
> Xinh
> 
> On Wed, Mar 9, 2016 at 7:07 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Sabarish,
> 
> I found a similar posting online where I should use the S3 listKeys. 
> http://stackoverflow.com/questions/24029873/how-to-read-multiple-text-files-into-a-single-rdd
>  
> <http://stackoverflow.com/questions/24029873/how-to-read-multiple-text-files-into-a-single-rdd>.
>  Is this what you were thinking?
> 
> And, your assumption is correct. The zipped CSV file contains only a single 
> file. I found this posting. 
> http://stackoverflow.com/questions/28969757/zip-support-in-apache-spark 
> <http://stackoverflow.com/questions/28969757/zip-support-in-apache-spark>. I 
> see how to do the unzipping, but I cannot get it to work when running the 
> code directly.
> 
> ...
> import java.io <http://java.io/>.{ IOException, FileOutputStream, 
> FileInputStream, File }
> import java.util.zip.{ ZipEntry, ZipInputStream }
> import org.apache.spark.input.PortableDataStream
> 
> sc.hadoopConfiguration.set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKey)
> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretKey)
> 
> val zipFile = 
> "s3n://events/2016/03/01/00/event-20160301.00-4877ff81-928f-4da4-89b6-6d40a28d61c7.csv.zip
>  <>"
> val zipFileRDD = sc.binaryFiles(zipFile).flatMap { case (name: String, 
> content: PortableDataStream) => new ZipInputStream(content.open) }
> 
> :95: error: type mismatch;
>  found   : java.util.zip.ZipInputStream
&

Problem Updating Stats

2016-03-15 Thread Benjamin Kim
When trying to run update status on an existing table in hbase, I get error:
Update stats:
UPDATE STATISTICS "ops_csv" ALL
error:
ERROR 504 (42703): Undefined column. columnName=REGION_NAME
Looks like the meta data information is messed up, ie. there is no column with 
name REGION_NAME in this table.
I see similar errors for other tables that we currently have in hbase.

We are using CDH 5.5.2, HBase 1.0.0, and Phoenix 4.5.2.

Thanks,
Ben

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted,

Is there anything in the works or are there tasks already to do the 
back-porting?

Just curious.

Thanks,
Ben

> On Mar 13, 2016, at 3:46 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> class HFileWriterImpl (in standalone file) is only present in master branch.
> It is not in branch-1.
> 
> compressionByName() resides in class with @InterfaceAudience.Private which 
> got moved in master branch.
> 
> So looks like there is some work to be done for backporting to branch-1 :-)
> 
> On Sun, Mar 13, 2016 at 1:35 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Ted,
> 
> I did as you said, but it looks like that HBaseContext relies on some 
> differences in HBase itself.
> 
> [ERROR] 
> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30:
>  error: object HFileWriterImpl is not a member of package 
> org.apache.hadoop.hbase.io.hfile
> [ERROR] import org.apache.hadoop.hbase.io.hfile.{CacheConfig, 
> HFileContextBuilder, HFileWriterImpl}
> [ERROR]^
> [ERROR] 
> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:627:
>  error: not found: value HFileWriterImpl
> [ERROR] val hfileCompression = HFileWriterImpl
> [ERROR]^
> [ERROR] 
> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:750:
>  error: not found: value HFileWriterImpl
> [ERROR] val defaultCompression = HFileWriterImpl
> [ERROR]  ^
> [ERROR] 
> /home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:898:
>  error: value COMPARATOR is not a member of object 
> org.apache.hadoop.hbase.CellComparator
> [ERROR] 
> .withComparator(CellComparator.COMPARATOR).withFileContext(hFileContext)
> 
> So… back to my original question… do you know when these incompatibilities 
> were introduced? If so, I can pulled that version at time and try again.
> 
> Thanks,
> Ben
> 
>> On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>> 
>> Benjamin:
>> Since hbase-spark is in its own module, you can pull the whole hbase-spark 
>> subtree into hbase 1.0 root dir and add the following to root pom.xml:
>>     hbase-spark
>> 
>> Then you would be able to build the module yourself.
>> 
>> hbase-spark module uses APIs which are compatible with hbase 1.0
>> 
>> Cheers
>> 
>> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Hi Ted,
>> 
>> I see that you’re working on the hbase-spark module for hbase. I recently 
>> packaged the SparkOnHBase project and gave it a test run. It works like a 
>> charm on CDH 5.4 and 5.5. All I had to do was add 
>> /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the 
>> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars 
>> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the 
>> easy-to-use HBaseContext for HBase operations. Now, I want to use the latest 
>> in Dataframes. Since the new functionality is only in the hbase-spark 
>> module, I want to know how to get it and package it for CDH 5.5, which still 
>> uses HBase 1.0.0. Can you tell me what version of hbase master is still 
>> backwards compatible?
>> 
>> By the way, we are using Spark 1.6 if it matters.
>> 
>> Thanks,
>> Ben
>> 
>>> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com 
>>> <mailto:yuzhih...@gmail.com>> wrote:
>>> 
>>> Have you tried adding hbase client jars to spark.executor.extraClassPath ?
>>> 
>>> Cheers
>>> 
>>> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com 
>>> <mailto:prabhujose.ga...@gmail.com>> wrote:
>>> + Spark-Dev
>>> 
>>> For a Spark job on YARN accessing hbase table, added all hbase client jars 
>>> into spark.yarn.dist.files, NodeManager when launching container i.e 
>>> executor, does localization and brings all hbase-client jars into executor 
>>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase 
>>> client jars, when i checked launch container.sh , Classpath does not have 
>>> $PWD/* and hence all the hbase client jars are ignored.
>>> 
>>> Is spark.yarn.dist.files not for adding jars into the executor classpath.
>>> 
>>> T

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted,

I did as you said, but it looks like that HBaseContext relies on some 
differences in HBase itself.

[ERROR] 
/home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:30:
 error: object HFileWriterImpl is not a member of package 
org.apache.hadoop.hbase.io.hfile
[ERROR] import org.apache.hadoop.hbase.io.hfile.{CacheConfig, 
HFileContextBuilder, HFileWriterImpl}
[ERROR]^
[ERROR] 
/home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:627:
 error: not found: value HFileWriterImpl
[ERROR] val hfileCompression = HFileWriterImpl
[ERROR]^
[ERROR] 
/home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:750:
 error: not found: value HFileWriterImpl
[ERROR] val defaultCompression = HFileWriterImpl
[ERROR]  ^
[ERROR] 
/home/bkim/hbase-rel-1.0.2/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala:898:
 error: value COMPARATOR is not a member of object 
org.apache.hadoop.hbase.CellComparator
[ERROR] 
.withComparator(CellComparator.COMPARATOR).withFileContext(hFileContext)

So… back to my original question… do you know when these incompatibilities were 
introduced? If so, I can pulled that version at time and try again.

Thanks,
Ben

> On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Benjamin:
> Since hbase-spark is in its own module, you can pull the whole hbase-spark 
> subtree into hbase 1.0 root dir and add the following to root pom.xml:
> hbase-spark
> 
> Then you would be able to build the module yourself.
> 
> hbase-spark module uses APIs which are compatible with hbase 1.0
> 
> Cheers
> 
> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Ted,
> 
> I see that you’re working on the hbase-spark module for hbase. I recently 
> packaged the SparkOnHBase project and gave it a test run. It works like a 
> charm on CDH 5.4 and 5.5. All I had to do was add 
> /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the 
> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars 
> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the easy-to-use 
> HBaseContext for HBase operations. Now, I want to use the latest in 
> Dataframes. Since the new functionality is only in the hbase-spark module, I 
> want to know how to get it and package it for CDH 5.5, which still uses HBase 
> 1.0.0. Can you tell me what version of hbase master is still backwards 
> compatible?
> 
> By the way, we are using Spark 1.6 if it matters.
> 
> Thanks,
> Ben
> 
>> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>> 
>> Have you tried adding hbase client jars to spark.executor.extraClassPath ?
>> 
>> Cheers
>> 
>> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com 
>> <mailto:prabhujose.ga...@gmail.com>> wrote:
>> + Spark-Dev
>> 
>> For a Spark job on YARN accessing hbase table, added all hbase client jars 
>> into spark.yarn.dist.files, NodeManager when launching container i.e 
>> executor, does localization and brings all hbase-client jars into executor 
>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase 
>> client jars, when i checked launch container.sh , Classpath does not have 
>> $PWD/* and hence all the hbase client jars are ignored.
>> 
>> Is spark.yarn.dist.files not for adding jars into the executor classpath.
>> 
>> Thanks,
>> Prabhu Joseph 
>> 
>> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com 
>> <mailto:prabhujose.ga...@gmail.com>> wrote:
>> Hi All,
>> 
>>  When i do count on a Hbase table from Spark Shell which runs as yarn-client 
>> mode, the job fails at count().
>> 
>> MASTER=yarn-client ./spark-shell
>> 
>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, 
>> TableName}
>> import org.apache.hadoop.hbase.client.HBaseAdmin
>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>  
>> val conf = HBaseConfiguration.create()
>> conf.set(TableInputFormat.INPUT_TABLE,"spark")
>> 
>> val hBaseRDD = sc.newAPIHadoopRDD(conf, 
>> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
>> hBaseRDD.count()
>> 
>> 
>> Tasks throw below exception, the actual exception is swallowed, a bug 
>> JDK

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Ted,

That’s great! I didn’t know. I will proceed with it as you said.

Thanks,
Ben

> On Mar 13, 2016, at 12:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Benjamin:
> Since hbase-spark is in its own module, you can pull the whole hbase-spark 
> subtree into hbase 1.0 root dir and add the following to root pom.xml:
> hbase-spark
> 
> Then you would be able to build the module yourself.
> 
> hbase-spark module uses APIs which are compatible with hbase 1.0
> 
> Cheers
> 
> On Sun, Mar 13, 2016 at 11:39 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Ted,
> 
> I see that you’re working on the hbase-spark module for hbase. I recently 
> packaged the SparkOnHBase project and gave it a test run. It works like a 
> charm on CDH 5.4 and 5.5. All I had to do was add 
> /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the 
> classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars 
> /path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the easy-to-use 
> HBaseContext for HBase operations. Now, I want to use the latest in 
> Dataframes. Since the new functionality is only in the hbase-spark module, I 
> want to know how to get it and package it for CDH 5.5, which still uses HBase 
> 1.0.0. Can you tell me what version of hbase master is still backwards 
> compatible?
> 
> By the way, we are using Spark 1.6 if it matters.
> 
> Thanks,
> Ben
> 
>> On Feb 10, 2016, at 2:34 AM, Ted Yu <yuzhih...@gmail.com 
>> <mailto:yuzhih...@gmail.com>> wrote:
>> 
>> Have you tried adding hbase client jars to spark.executor.extraClassPath ?
>> 
>> Cheers
>> 
>> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph <prabhujose.ga...@gmail.com 
>> <mailto:prabhujose.ga...@gmail.com>> wrote:
>> + Spark-Dev
>> 
>> For a Spark job on YARN accessing hbase table, added all hbase client jars 
>> into spark.yarn.dist.files, NodeManager when launching container i.e 
>> executor, does localization and brings all hbase-client jars into executor 
>> CWD, but still the executor tasks fail with ClassNotFoundException of hbase 
>> client jars, when i checked launch container.sh , Classpath does not have 
>> $PWD/* and hence all the hbase client jars are ignored.
>> 
>> Is spark.yarn.dist.files not for adding jars into the executor classpath.
>> 
>> Thanks,
>> Prabhu Joseph 
>> 
>> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph <prabhujose.ga...@gmail.com 
>> <mailto:prabhujose.ga...@gmail.com>> wrote:
>> Hi All,
>> 
>>  When i do count on a Hbase table from Spark Shell which runs as yarn-client 
>> mode, the job fails at count().
>> 
>> MASTER=yarn-client ./spark-shell
>> 
>> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, 
>> TableName}
>> import org.apache.hadoop.hbase.client.HBaseAdmin
>> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>>  
>> val conf = HBaseConfiguration.create()
>> conf.set(TableInputFormat.INPUT_TABLE,"spark")
>> 
>> val hBaseRDD = sc.newAPIHadoopRDD(conf, 
>> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
>> hBaseRDD.count()
>> 
>> 
>> Tasks throw below exception, the actual exception is swallowed, a bug 
>> JDK-7172206. After installing hbase client on all NodeManager machines, the 
>> Spark job ran fine. So I confirmed that the issue is with executor classpath.
>> 
>> But i am searching for some other way of including hbase jars in spark 
>> executor classpath instead of installing hbase client on all NM machines. 
>> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that it 
>> localized all hbase jars, still the job fails. Tried 
>> spark.executor.extraClasspath, still the job fails.
>> 
>> Is there any way we can access hbase from Executor without installing 
>> hbase-client on all machines.
>> 
>> 
>> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
>> prabhuFS1): java.lang.IllegalStateException: unread block data
>> at 
>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
>> at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
>> at 
>> java.io.ObjectInputStream.readSerialData(ObjectI

Re: Spark Job on YARN accessing Hbase Table

2016-03-13 Thread Benjamin Kim
Hi Ted,

I see that you’re working on the hbase-spark module for hbase. I recently 
packaged the SparkOnHBase project and gave it a test run. It works like a charm 
on CDH 5.4 and 5.5. All I had to do was add 
/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar to the 
classpath.txt file in /etc/spark/conf. Then, I ran spark-shell with “—jars 
/path/to/spark-hbase-0.0.2-clabs.jar” as an argument and used the easy-to-use 
HBaseContext for HBase operations. Now, I want to use the latest in Dataframes. 
Since the new functionality is only in the hbase-spark module, I want to know 
how to get it and package it for CDH 5.5, which still uses HBase 1.0.0. Can you 
tell me what version of hbase master is still backwards compatible?

By the way, we are using Spark 1.6 if it matters.

Thanks,
Ben

> On Feb 10, 2016, at 2:34 AM, Ted Yu  wrote:
> 
> Have you tried adding hbase client jars to spark.executor.extraClassPath ?
> 
> Cheers
> 
> On Wed, Feb 10, 2016 at 12:17 AM, Prabhu Joseph  > wrote:
> + Spark-Dev
> 
> For a Spark job on YARN accessing hbase table, added all hbase client jars 
> into spark.yarn.dist.files, NodeManager when launching container i.e 
> executor, does localization and brings all hbase-client jars into executor 
> CWD, but still the executor tasks fail with ClassNotFoundException of hbase 
> client jars, when i checked launch container.sh , Classpath does not have 
> $PWD/* and hence all the hbase client jars are ignored.
> 
> Is spark.yarn.dist.files not for adding jars into the executor classpath.
> 
> Thanks,
> Prabhu Joseph 
> 
> On Tue, Feb 9, 2016 at 1:42 PM, Prabhu Joseph  > wrote:
> Hi All,
> 
>  When i do count on a Hbase table from Spark Shell which runs as yarn-client 
> mode, the job fails at count().
> 
> MASTER=yarn-client ./spark-shell
> 
> import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor, 
> TableName}
> import org.apache.hadoop.hbase.client.HBaseAdmin
> import org.apache.hadoop.hbase.mapreduce.TableInputFormat
>  
> val conf = HBaseConfiguration.create()
> conf.set(TableInputFormat.INPUT_TABLE,"spark")
> 
> val hBaseRDD = sc.newAPIHadoopRDD(conf, 
> classOf[TableInputFormat],classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],classOf[org.apache.hadoop.hbase.client.Result])
> hBaseRDD.count()
> 
> 
> Tasks throw below exception, the actual exception is swallowed, a bug 
> JDK-7172206. After installing hbase client on all NodeManager machines, the 
> Spark job ran fine. So I confirmed that the issue is with executor classpath.
> 
> But i am searching for some other way of including hbase jars in spark 
> executor classpath instead of installing hbase client on all NM machines. 
> Tried adding all hbase jars in spark.yarn.dist.files , NM logs shows that it 
> localized all hbase jars, still the job fails. Tried 
> spark.executor.extraClasspath, still the job fails.
> 
> Is there any way we can access hbase from Executor without installing 
> hbase-client on all machines.
> 
> 
> 16/02/09 02:34:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
> prabhuFS1): java.lang.IllegalStateException: unread block data
> at 
> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2428)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1997)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1921)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:68)
> at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:94)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:185)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 
> 
> 
> Thanks,
> Prabhu Joseph
> 
> 



Re: S3 Zip File Loading Advice

2016-03-09 Thread Benjamin Kim
Hi Sabarish,

I found a similar posting online where I should use the S3 listKeys. 
http://stackoverflow.com/questions/24029873/how-to-read-multiple-text-files-into-a-single-rdd.
 Is this what you were thinking?

And, your assumption is correct. The zipped CSV file contains only a single 
file. I found this posting. 
http://stackoverflow.com/questions/28969757/zip-support-in-apache-spark. I see 
how to do the unzipping, but I cannot get it to work when running the code 
directly.

...
import java.io.{ IOException, FileOutputStream, FileInputStream, File }
import java.util.zip.{ ZipEntry, ZipInputStream }
import org.apache.spark.input.PortableDataStream

sc.hadoopConfiguration.set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", accessKey)
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secretKey)

val zipFile = 
"s3n://events/2016/03/01/00/event-20160301.00-4877ff81-928f-4da4-89b6-6d40a28d61c7.csv.zip"
val zipFileRDD = sc.binaryFiles(zipFile).flatMap { case (name: String, content: 
PortableDataStream) => new ZipInputStream(content.open) }

:95: error: type mismatch;
 found   : java.util.zip.ZipInputStream
 required: TraversableOnce[?]
 val zipFileRDD = sc.binaryFiles(zipFile).flatMap { case (name, 
content) => new ZipInputStream(content.open) }

^

Thanks,
Ben

> On Mar 9, 2016, at 12:03 AM, Sabarish Sasidharan <sabarish@gmail.com> 
> wrote:
> 
> You can use S3's listKeys API and do a diff between consecutive listKeys to 
> identify what's new.
> 
> Are there multiple files in each zip? Single file archives are processed just 
> like text as long as it is one of the supported compression formats.
> 
> Regards
> Sab
> 
> On Wed, Mar 9, 2016 at 10:33 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> I am wondering if anyone can help.
> 
> Our company stores zipped CSV files in S3, which has been a big headache from 
> the start. I was wondering if anyone has created a way to iterate through 
> several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, 
> etc.) in S3 to find the newest files and load them. It would be a big bonus 
> to include the unzipping of the file in the process so that the CSV can be 
> loaded directly into a dataframe for further processing. I’m pretty sure that 
> the S3 part of this request is not uncommon. I would think the file being 
> zipped is uncommon. If anyone can help, I would truly be grateful for I am 
> new to Scala and Spark. This would be a great help in learning.
> 
> Thanks,
> Ben
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
> 



S3 Zip File Loading Advice

2016-03-08 Thread Benjamin Kim
I am wondering if anyone can help.

Our company stores zipped CSV files in S3, which has been a big headache from 
the start. I was wondering if anyone has created a way to iterate through 
several subdirectories (s3n://events/2016/03/01/00, s3n://2016/03/01/01, etc.) 
in S3 to find the newest files and load them. It would be a big bonus to 
include the unzipping of the file in the process so that the CSV can be loaded 
directly into a dataframe for further processing. I’m pretty sure that the S3 
part of this request is not uncommon. I would think the file being zipped is 
uncommon. If anyone can help, I would truly be grateful for I am new to Scala 
and Spark. This would be a great help in learning.

Thanks,
Ben
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Steps to Run Spark Scala job from Oozie on EC2 Hadoop clsuter

2016-03-07 Thread Benjamin Kim
To comment…

At my company, we have not gotten it to work in any other mode than local. If 
we try any of the yarn modes, it fails with a “file does not exist” error when 
trying to locate the executable jar. I mentioned this to the Hue users group, 
which we used for this, and they replied that the Spark Action is very basic 
implementation and that they will be writing their own for production use.

That’s all I know...

> On Mar 7, 2016, at 1:18 AM, Deepak Sharma  wrote:
> 
> There is Spark action defined for oozie workflows.
> Though I am not sure if it supports only Java SPARK jobs or Scala jobs as 
> well.
> https://oozie.apache.org/docs/4.2.0/DG_SparkActionExtension.html 
> 
> Thanks
> Deepak
> 
> On Mon, Mar 7, 2016 at 2:44 PM, Divya Gehlot  > wrote:
> Hi,
> 
> Could somebody help me by providing the steps /redirect me  to 
> blog/documentation on how to run Spark job written in scala through Oozie.
> 
> Would really appreciate the help.
> 
> 
> 
> Thanks,
> Divya 
> 
> 
> 
> -- 
> Thanks
> Deepak
> www.bigdatabig.com 
> www.keosha.net 


Hadoop 2.8 Release Data

2016-03-04 Thread Benjamin Kim
I have a general question about Hadoop 2.8. Is it being prepped for release 
anytime soon? I am awaiting HADOOP-5732 bringing SFTP support natively.

Thanks,
Ben
-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: SFTP Compressed CSV into Dataframe

2016-03-03 Thread Benjamin Kim
Sumedh,

How would this work? The only server that we have is the Oozie server with no 
resources to run anything except Oozie, and we have no sudo permissions. If we 
run the mount command using the shell action which can run on any node of the 
cluster via YARN, then the spark job will not be able to see it because it 
could exist on any random unknown node. If we run the mount command using shell 
commands in spark, then could be possible that the mount will exist on the same 
node as the executor reading the file?

Thanks,
Ben 

> On Mar 3, 2016, at 10:29 AM, Sumedh Wale <sw...@snappydata.io> wrote:
> 
> (-user)
> 
> On Thursday 03 March 2016 10:09 PM, Benjamin Kim wrote:
>> I forgot to mention that we will be scheduling this job using Oozie. So, we 
>> will not be able to know which worker node is going to being running this. 
>> If we try to do anything local, it would get lost. This is why I’m looking 
>> for something that does not deal with the local file system.
> 
> Can't you mount using sshfs locally as part of the job at the start, then 
> unmount at the end? This is assuming that the platform being used is Linux.
> 
>>> On Mar 2, 2016, at 11:17 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
>>> 
>>> I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV 
>>> file? I am able to download the file first locally using the SFTP Client in 
>>> the spark-sftp package. Then, I load the file into a dataframe using the 
>>> spark-csv package, which automatically decompresses the file. I just want 
>>> to remove the "downloading file to local" step and directly have the remote 
>>> file decompressed, read, and loaded. Can someone give me any hints?
>>> 
>>> Thanks,
>>> Ben
> 
> thanks
> 
> -- 
> Sumedh Wale
> SnappyData (http://www.snappydata.io)
> 
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Building a REST Service with Spark back-end

2016-03-02 Thread Benjamin Kim
I want to ask about something related to this.

Does anyone know if there is or will be a command line equivalent of 
spark-shell client for Livy Spark Server or any other Spark Job Server? The 
reason that I am asking spark-shell does not handle multiple users on the same 
server well. Since a Spark Job Server can generate "sessions" for each user, it 
would be great if this were possible.

Another person in the Livy users group pointed out some advantages.

I think the use case makes complete sense for a number of reasons:
1. You wouldn't need an installation of spark and configs on the gateway machine
2. Since Livy is over HTTP, it'd be easier to run spark-shell in front of a 
firewall
3. Can "connect/disconnect" to sessions similar to screen in linux

Thanks,
Ben

> On Mar 2, 2016, at 1:11 PM, Guru Medasani  wrote:
> 
> Hi Yanlin,
> 
> This is a fairly new effort and is not officially released/supported by 
> Cloudera yet. I believe those numbers will be out once it is released.
> 
> Guru Medasani
> gdm...@gmail.com 
> 
> 
> 
>> On Mar 2, 2016, at 10:40 AM, yanlin wang > > wrote:
>> 
>> Did any one use Livy in real world high concurrency web app? I think it uses 
>> spark submit command line to create job... How about  job server or notebook 
>> comparing with Livy?
>> 
>> Thx,
>> Yanlin
>> 
>> Sent from my iPhone
>> 
>> On Mar 2, 2016, at 6:24 AM, Guru Medasani > > wrote:
>> 
>>> Hi Don,
>>> 
>>> Here is another REST interface for interacting with Spark from anywhere. 
>>> 
>>> https://github.com/cloudera/livy 
>>> 
>>> Here is an example to estimate PI using Spark from Python using requests 
>>> library. 
>>> 
>>> >>> data = {
>>> ...   'code': textwrap.dedent("""\
>>> ...  val NUM_SAMPLES = 10;
>>> ...  val count = sc.parallelize(1 to NUM_SAMPLES).map { i =>
>>> ...val x = Math.random();
>>> ...val y = Math.random();
>>> ...if (x*x + y*y < 1) 1 else 0
>>> ...  }.reduce(_ + _);
>>> ...  println(\"Pi is roughly \" + 4.0 * count / NUM_SAMPLES)
>>> ...  """)
>>> ... }
>>> >>> r = requests.post(statements_url, data=json.dumps(data), 
>>> >>> headers=headers)
>>> >>> pprint.pprint(r.json())
>>> {u'id': 1,
>>>  u'output': {u'data': {u'text/plain': u'Pi is roughly 3.14004\nNUM_SAMPLES: 
>>> Int = 10\ncount: Int = 78501'},
>>>  u'execution_count': 1,
>>>  u'status': u'ok'},
>>>  u'state': u'available'}
>>> 
>>> 
>>> Guru Medasani
>>> gdm...@gmail.com 
>>> 
>>> 
>>> 
 On Mar 2, 2016, at 7:47 AM, Todd Nist > wrote:
 
 Have you looked at Apache Toree, http://toree.apache.org/ 
 .  This was formerly the Spark-Kernel from IBM 
 but contributed to apache.
 
 https://github.com/apache/incubator-toree 
 
 
 You can find a good overview on the spark-kernel here:
 http://www.spark.tc/how-to-enable-interactive-applications-against-apache-spark/
  
 
 
 Not sure if that is of value to you or not.
 
 HTH.
 
 -Todd
 
 On Tue, Mar 1, 2016 at 7:30 PM, Don Drake > wrote:
 I'm interested in building a REST service that utilizes a Spark SQL 
 Context to return records from a DataFrame (or IndexedRDD?) and even 
 add/update records.
 
 This will be a simple REST API, with only a few end-points.  I found this 
 example:
 
 https://github.com/alexmasselot/spark-play-activator 
 
 
 which looks close to what I am interested in doing.  
 
 Are there any other ideas or options if I want to run this in a YARN 
 cluster?
 
 Thanks.
 
 -Don
 
 -- 
 Donald Drake
 Drake Consulting
 http://www.drakeconsulting.com/ 
 https://twitter.com/dondrake 
 800-733-2143 
>>> 
> 



SFTP Compressed CSV into Dataframe

2016-03-02 Thread Benjamin Kim
I wonder if anyone has opened a SFTP connection to open a remote GZIP CSV file? 
I am able to download the file first locally using the SFTP Client in the 
spark-sftp package. Then, I load the file into a dataframe using the spark-csv 
package, which automatically decompresses the file. I just want to remove the 
"downloading file to local" step and directly have the remote file 
decompressed, read, and loaded. Can someone give me any hints?

Thanks,
Ben



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark on Kudu

2016-03-01 Thread Benjamin Kim
Hi J-D,

Quick question… Is there an ETA for KUDU-1214? I want to target a version of 
Kudu to begin real testing of Spark against it for our devs. At least, I can 
tell them what timeframe to anticipate.

Just curious,
Benjamin Kim
Data Solutions Architect

[a•mo•bee] (n.) the company defining digital marketing.

Mobile: +1 818 635 2900
3250 Ocean Park Blvd, Suite 200  |  Santa Monica, CA 90405  |  
www.amobee.com<http://www.amobee.com/>

On Feb 24, 2016, at 3:51 PM, Jean-Daniel Cryans 
<jdcry...@apache.org<mailto:jdcry...@apache.org>> wrote:

The DStream stuff isn't there at all. I'm not sure if it's needed either.

The kuduRDD is just leveraging the MR input format, ideally we'd use scans 
directly.

The SparkSQL stuff is there but it doesn't do any sort of pushdown. It's really 
basic.

The goal was to provide something for others to contribute to. We have some 
basic unit tests that others can easily extend. None of us on the team are 
Spark experts, but we'd be really happy to assist one improve the kudu-spark 
code.

J-D

On Wed, Feb 24, 2016 at 3:41 PM, Benjamin Kim 
<bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote:
J-D,

It looks like it fulfills most of the basic requirements (kudu RDD, kudu 
DStream) in KUDU-1214. Am I right? Besides shoring up more Spark SQL 
functionality (Dataframes) and doing the documentation, what more needs to be 
done? Optimizations?

I believe that it’s a good place to start using Spark with Kudu and compare it 
to HBase with Spark (not clean).

Thanks,
Ben


On Feb 24, 2016, at 3:10 PM, Jean-Daniel Cryans 
<jdcry...@apache.org<mailto:jdcry...@apache.org>> wrote:

AFAIK no one is working on it, but we did manage to get this in for 0.7.0: 
https://issues.cloudera.org/browse/KUDU-1321

It's a really simple wrapper, and yes you can use SparkSQL on Kudu, but it will 
require a lot more work to make it fast/useful.

Hope this helps,

J-D

On Wed, Feb 24, 2016 at 3:08 PM, Benjamin Kim 
<bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote:
I see this KUDU-1214<https://issues.cloudera.org/browse/KUDU-1214> targeted for 
0.8.0, but I see no progress on it. When this is complete, will this mean that 
Spark will be able to work with Kudu both programmatically and as a client via 
Spark SQL? Or is there more work that needs to be done on the Spark side for it 
to work?

Just curious.

Cheers,
Ben







Re: [DISCUSS] Update Roadmap

2016-03-01 Thread Benjamin Kim
I see in the Enterprise section that multi-tenancy will be included, will this 
have user impersonation too? In this way, the user executing will be the user 
owning the process.

> On Mar 1, 2016, at 12:51 AM, Shabeel Syed  wrote:
> 
> +1
> 
> Hi Tamas,
>Pluggable external visualization is really a GREAT feature to have. I'm 
> looking forward to this :)
> 
> Regards
> Shabeel
> 
> On Tue, Mar 1, 2016 at 2:16 PM, Tamas Szuromi  > wrote:
> Hey,
> 
> Really promising roadmap.
> 
> I'd only push more visualization options. I agree built in visualization is 
> needed with limited charting options but I think we also need somehow 
> 'inject' external js visualizations also. 
> 
> 
> For scheduling Zeppelin notebooks  we use https://github.com/airbnb/airflow 
>  through the job rest api. It's an 
> enterprise ready and very robust solution right now.
> 
> Tamas
> 
> 
> On 1 March 2016 at 09:12, Eran Witkon  > wrote:
> One point to clarify, I don't want to suggest Oozie in specific, I want to 
> think about which features we develop and which ones we integrate external, 
> preferred Apache, technology? We don't think about building our own storage 
> services so why build our own scheduler?
> Eran 
> On Tue, 1 Mar 2016 at 09:49 moon soo Lee  > wrote:
> @Vinayak, @Eran, @Benjamin, @Guilherme, @Sourav, @Rick
> Now I can see a lot of demands around enterprise level job scheduling. Either 
> external or built-in, I completely agree having enterprise level job 
> scheduling support on the roadmap.
> ZEPPELIN-137 , 
> ZEPPELIN-531  are related 
> issues i can find in our JIRA.
> 
> @Vinayak
> Regarding importing notebook from github, Zeppelin has pluggable notebook 
> storage layer (see related package 
> ).
>  So, github notebook sync can be implemented easily.
> 
> @Shabeel
> Right, we need better manage management to prevent such OOM.
> And i think table is one of the most frequently used way of displaying data. 
> So definitely, we'll need more features like filter, sort, etc.
> After this roadmap discussion, discussion for the next release will follow. 
> Then we'll get idea when those features will be available.
> 
> @Prasad
> Thanks for mentioning HA and DR. They're really important subject for 
> enterprise use. Definitely Zeppelin will need to address them.
> And displaying meta information of notebook on top level page is good idea.
> 
> It's really great to hear many opinions and ideas.
> And thanks @Rick for sharing valuable view to Zeppelin project.
> 
> Thanks,
> moon
> 
> 
> On Mon, Feb 29, 2016 at 11:14 PM Rick Moritz  > wrote:
> Hi,
> 
> For one, I know that there is rudimentary scheduling built into Zeppelin 
> already (at least I fixed a bug in the test for a scheduling feature a few 
> months ago).
> But another point is, that Zeppelin should also focus on quality, 
> reproduceability and portability.
> Although this doesn't offer exciting new features, it would make development 
> much easier.
> 
> Cross-platform testability, Tests that pass when run sequentially, 
> compatibility with Firefox, and many more open issues that make it so much 
> harder to enhance Zeppelin and add features should be addressed soon, 
> preferably before more features are added. Already Zeppelin is suffering - in 
> my opinion - from quite a lot of feature creep, and we should avoid putting 
> in the kitchen sink, at the cost of quality and maintainability. Instead 
> modularity (ZEPPELIN-533 in particular) should be targeted.
> 
> Oozie, in my opinion, is a dead end - it may de-facto still be in use on many 
> clusters, but it's not getting the love it needs, and I wouldn't bet on it, 
> when it comes to integrating scheduling. Instead, any external tool should be 
> able to use the REST-API to trigger executions, if you want external 
> scheduling.
> 
> So, in conclusion, if we take Moon's list as a list of descending priorities, 
> I fully agree, under the condition that code quality is included as a subset 
> of enterprise-readyness. Auth* is paramount (Kerberos SPNEGO SSO support is 
> what we really want) with user and group rights assignment on the notebook 
> level. We probably also need Knox-integration (ODP-Members looking at 
> integrating Zeppelin should consider contributing this), and integration of 
> something like Spree (https://github.com/hammerlab/spree 
> ) to be able to profile jobs.
> 
> I'm hopeful that soon I can resume contributing some quality-oriented code, 
> to drive this 

Re: [DISCUSS] Update Roadmap

2016-02-29 Thread Benjamin Kim
I concur with this suggestion. In the enterprise, management would like to see 
scheduled runs to be tracked, monitored, and given SLA constraints for the 
mission critical. Alerts and notifications are crucial for DevOps to respond 
with error clarification within it. If the Zeppelin notebooks can be executed 
by a third party scheduling application, such as Oozie, then this requirement 
can be satisfied if there are no immediate plans for a built-in one.

> On Feb 29, 2016, at 1:17 AM, Eran Witkon  wrote:
> 
> @Vinayak Agrawal I would suggest adding the ability to connect zeppelin to 
> existing scheduling tools\workflow tools such as  https://oozie.apache.org/ 
> . this requires betters hooks and status reporting 
> but doesn't make zeppeling and ETL\scheduler tool by itself/
> 
> 
> On Mon, Feb 29, 2016 at 10:21 AM Vinayak Agrawal  > wrote:
> Moon,
> The new roadmap looks very promising. I am very happy to see security in the 
> list.
> I have some suggestions regarding Enterprise Ready features:
> 
> 1. Job Scheduler - Can this be improved? 
> Currently the scheduler can be used with Cron expression or a pre-set time. 
> But in an enterprise solution, a notebook might be one piece of the workflow. 
> Can we look towards the functionality of scheduling notebook's based on other 
> notebooks finishing their job successfully?
> This requirement would arise in any ETL workflow, where all the downstream 
> users wait for the ETL notebook to finish successfully. Only after that, 
> other business oriented notebooks can be executed.  
> 
> 2. Importing a notebook - Is there a current requirement or future plan to 
> implement a feature that allows import-notebook-from-github? This would allow 
> users to share notebooks seamlessly. 
> 
> Thanks 
> Vinayak
> 
> On Sun, Feb 28, 2016 at 11:22 PM, moon soo Lee  > wrote:
> Zhong Wang, 
> Right, Folder support would be quite useful. Thanks for the opinion. 
> Hope i can finish the work pr-190 
> .
> 
> Sourav,
> Regarding concurrent running, Zeppelin doesn't have limitation of run 
> paragraph/query concurrently. Interpreter can implement it's own scheduling 
> policy. For example, SparkSQL interpreter and ShellInterpreter can already 
> run paragraph/query concurrently.
> 
> SparkInterpreter is implemented with FIFO scheduler considering nature of 
> scala compiler. That's why user can not run multiple paragraph concurrently 
> when they work with SparkInterpreter.
> But as Zhong Wang mentioned, pr-703 enables each notebook will have separate 
> scala compiler so paragraphs run concurrently, while they're in different 
> notebooks.
> Thanks for the feedback!
> 
> Best,
> moon
> On Sat, Feb 27, 2016 at 8:59 PM Zhong Wang  > wrote:
> Sourav: I think this newly merged PR can help you 
> https://github.com/apache/incubator-zeppelin/pull/703#issuecomment-185582537 
> 
> 
> On Sat, Feb 27, 2016 at 1:46 PM, Sourav Mazumder  > wrote:
> Hi Moon,
> 
> This looks great.
> 
> My only suggestion would be to include a PR/feature - Support for Running 
> Concurrent paragraphs/queries in Zeppelin. 
> 
> Right now if more than one user tries to run paragraphs in multiple notebooks 
> concurrently through a single Zeppelin instance (and single interpreter 
> instance) the performance is very slow. It is obvious that the queue gets 
> built up within the zeppelin process and interpreter process in that scenario 
> as the time taken to move the status from start to pending and pending to 
> running is very high compared to the actual running time of a paragraph.
> 
> Without this the multi tenancy support would be meaningless as no one can 
> practically use it in a situation where multiple users are trying to connect 
> to the same instance of Zeppelin (and the related interpreter). A possible 
> solution would be to spawn separate instance of the same interpreter at every 
> notebook/user level.
> 
> Regards,
> Sourav
> On Sat, Feb 27, 2016 at 12:48 PM, moon soo Lee  > wrote:
> Hi Zeppelin users and developers,
> 
> The roadmap we have published at
> https://cwiki.apache.org/confluence/display/ZEPPELIN/Zeppelin+Roadmap 
> 
> is almost 9 month old, and it doesn't reflect where the community goes 
> anymore. It's time to update.
> 
> Based on mailing list, jira issues, pullrequests, feedbacks from users, 
> conferences and meetings, I could summarize the major interest of users and 
> developers in 7 categories. Enterprise ready, Usability improvement, 
> 

Re: zeppelin multi user mode?

2016-02-26 Thread Benjamin Kim
Ahyoung,

If z-manager can be expanded to include many of the features of Cloudera 
Manager or Ambari, I think it would be a formidable installer and monitoring 
tool for Zeppelin. Many in our organization are looking for something as 
production level as this. Are there any plans to make it so?

Thanks,
Ben

> On Feb 26, 2016, at 8:37 PM, Ahyoung Ryu <ahyoungry...@gmail.com> wrote:
> 
> Hi Benjamin,
> 
> I think this conversations <https://github.com/NFLabs/z-manager/issues/11> 
> may help you about the first question : )
> 
> Best,
> Ahyoung
> 
> 
> 
> 2016년 2월 27일 (토) 오전 12:11, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>>님이 작성:
> Anyone know when multi-tenancy will support Spark on Yarn? And will there be 
> a simpler way of installing it using z-manager in the future?
> 
> Thanks,
> Ben
> 
>> On Feb 14, 2016, at 6:12 PM, Alexander Bezzubov <abezzu...@nflabs.com 
>> <mailto:abezzu...@nflabs.com>> wrote:
>> 
>> Benjamin,
>> z-manager consists of 2 independant applications - installer and 
>> multitenancy.
>> 
>> You can use only the second one that Hyung Sung pointed out with any 
>> spark/zeppelin version.
>> 
>> If you have further questions, please do not hesitate to ask at  
>> z-mana...@googlegroups.com <mailto:z-mana...@googlegroups.com> 
>> https://groups.google.com/forum/#!forum/z-manager 
>> <https://groups.google.com/forum/#!forum/z-manager>
>> On Thu, Feb 4, 2016, 15:13 Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> I forgot to mention that I don’t see Spark 1.6 in the list of versions when 
>> installing z-manager.
>> 
>> 
>>> On Feb 3, 2016, at 10:08 PM, Corneau Damien <cornead...@gmail.com 
>>> <mailto:cornead...@gmail.com>> wrote:
>>> 
>>> @Benjamin,
>>> We do support version 1.6 of Spark, see: 
>>> https://github.com/apache/incubator-zeppelin#spark-interpreter 
>>> <https://github.com/apache/incubator-zeppelin#spark-interpreter>
>>> 
>>> On Wed, Feb 3, 2016 at 9:47 PM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> I see that the latest version of Spark supported is 1.4.1. When will the 
>>> latest versions of Spark be supported?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>>> On Feb 3, 2016, at 7:54 PM, Hyung Sung Shim <hss...@nflabs.com 
>>>> <mailto:hss...@nflabs.com>> wrote:
>>>> 
>>>> Hello yunfeng.
>>>> 
>>>> You can also refer to 
>>>> https://github.com/NFLabs/z-manager/tree/master/multitenancy 
>>>> <https://github.com/NFLabs/z-manager/tree/master/multitenancy>.
>>>> 
>>>> Thanks. 
>>>> 
>>>> 2016-02-04 3:56 GMT+09:00 Christopher Matta <cma...@mapr.com 
>>>> <mailto:cma...@mapr.com>>:
>>>> I have had luck with a single Zepplin installation and  config directories 
>>>> in each user home directory. That way each user gets their own instance 
>>>> and will not interfere with each other. 
>>>> 
>>>> You can start the Zepplin server with a config flag pointing to the config 
>>>> directory. Simply copy the config dir that comes with Zepplin to 
>>>> ~/.zeppelin and edit the zeppelin-site.xml to change default port for each 
>>>> user. Start like this: 
>>>> ./zeppelin.sh --config ~/.zeppelin start
>>>> 
>>>> 
>>>> On Wednesday, February 3, 2016, Lin, Yunfeng <yunfeng@citi.com 
>>>> <mailto:yunfeng@citi.com>> wrote:
>>>> Hi guys,
>>>> 
>>>>  
>>>> 
>>>> We are planning to use zeppelin for PROD for data scientists. One feature 
>>>> we desperately need is multi user mode.
>>>> 
>>>>  
>>>> 
>>>> Currently, zeppelin is great for single user use. However, since zeppelin 
>>>> spark context are shared among all users in one zeppelin server, it is not 
>>>> very suitable when there are multiple users on the same zeppelin server 
>>>> since they are going to interfere with each other in one spark context.
>>>> 
>>>>  
>>>> 
>>>> How do you guys address this need? Thanks.
>>>> 
>>>>  
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Chris Matta
>>>> cma...@mapr.com <mailto:cma...@mapr.com>
>>>> 215-701-3146 
>>>> 
>>> 
>>> 
>> 
> 



Data Export

2016-02-26 Thread Benjamin Kim
I don’t know if I’m missing something, but is there a way to export the result 
data into a CSV, Excel, etc. from a SQL statement?

Thanks,
Ben



Re: zeppelin multi user mode?

2016-02-26 Thread Benjamin Kim
Anyone know when multi-tenancy will support Spark on Yarn? And will there be a 
simpler way of installing it using z-manager in the future?

Thanks,
Ben

> On Feb 14, 2016, at 6:12 PM, Alexander Bezzubov <abezzu...@nflabs.com> wrote:
> 
> Benjamin,
> z-manager consists of 2 independant applications - installer and multitenancy.
> 
> You can use only the second one that Hyung Sung pointed out with any 
> spark/zeppelin version.
> 
> If you have further questions, please do not hesitate to ask at  
> z-mana...@googlegroups.com <mailto:z-mana...@googlegroups.com> 
> https://groups.google.com/forum/#!forum/z-manager 
> <https://groups.google.com/forum/#!forum/z-manager>
> On Thu, Feb 4, 2016, 15:13 Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> I forgot to mention that I don’t see Spark 1.6 in the list of versions when 
> installing z-manager.
> 
> 
>> On Feb 3, 2016, at 10:08 PM, Corneau Damien <cornead...@gmail.com 
>> <mailto:cornead...@gmail.com>> wrote:
>> 
>> @Benjamin,
>> We do support version 1.6 of Spark, see: 
>> https://github.com/apache/incubator-zeppelin#spark-interpreter 
>> <https://github.com/apache/incubator-zeppelin#spark-interpreter>
>> 
>> On Wed, Feb 3, 2016 at 9:47 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> I see that the latest version of Spark supported is 1.4.1. When will the 
>> latest versions of Spark be supported?
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Feb 3, 2016, at 7:54 PM, Hyung Sung Shim <hss...@nflabs.com 
>>> <mailto:hss...@nflabs.com>> wrote:
>>> 
>>> Hello yunfeng.
>>> 
>>> You can also refer to 
>>> https://github.com/NFLabs/z-manager/tree/master/multitenancy 
>>> <https://github.com/NFLabs/z-manager/tree/master/multitenancy>.
>>> 
>>> Thanks. 
>>> 
>>> 2016-02-04 3:56 GMT+09:00 Christopher Matta <cma...@mapr.com 
>>> <mailto:cma...@mapr.com>>:
>>> I have had luck with a single Zepplin installation and  config directories 
>>> in each user home directory. That way each user gets their own instance and 
>>> will not interfere with each other. 
>>> 
>>> You can start the Zepplin server with a config flag pointing to the config 
>>> directory. Simply copy the config dir that comes with Zepplin to 
>>> ~/.zeppelin and edit the zeppelin-site.xml to change default port for each 
>>> user. Start like this: 
>>> ./zeppelin.sh --config ~/.zeppelin start
>>> 
>>> 
>>> On Wednesday, February 3, 2016, Lin, Yunfeng <yunfeng@citi.com 
>>> <mailto:yunfeng@citi.com>> wrote:
>>> Hi guys,
>>> 
>>>  
>>> 
>>> We are planning to use zeppelin for PROD for data scientists. One feature 
>>> we desperately need is multi user mode.
>>> 
>>>  
>>> 
>>> Currently, zeppelin is great for single user use. However, since zeppelin 
>>> spark context are shared among all users in one zeppelin server, it is not 
>>> very suitable when there are multiple users on the same zeppelin server 
>>> since they are going to interfere with each other in one spark context.
>>> 
>>>  
>>> 
>>> How do you guys address this need? Thanks.
>>> 
>>>  
>>> 
>>> 
>>> 
>>> -- 
>>> Chris Matta
>>> cma...@mapr.com <mailto:cma...@mapr.com>
>>> 215-701-3146 
>>> 
>> 
>> 
> 



Re: Spark on Kudu

2016-02-24 Thread Benjamin Kim
J-D,

It looks like it fulfills most of the basic requirements (kudu RDD, kudu 
DStream) in KUDU-1214. Am I right? Besides shoring up more Spark SQL 
functionality (Dataframes) and doing the documentation, what more needs to be 
done? Optimizations?

I believe that it’s a good place to start using Spark with Kudu and compare it 
to HBase with Spark (not clean).

Thanks,
Ben


> On Feb 24, 2016, at 3:10 PM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
> 
> AFAIK no one is working on it, but we did manage to get this in for 0.7.0: 
> https://issues.cloudera.org/browse/KUDU-1321 
> <https://issues.cloudera.org/browse/KUDU-1321>
> 
> It's a really simple wrapper, and yes you can use SparkSQL on Kudu, but it 
> will require a lot more work to make it fast/useful.
> 
> Hope this helps,
> 
> J-D
> 
> On Wed, Feb 24, 2016 at 3:08 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> I see this KUDU-1214 <https://issues.cloudera.org/browse/KUDU-1214> targeted 
> for 0.8.0, but I see no progress on it. When this is complete, will this mean 
> that Spark will be able to work with Kudu both programmatically and as a 
> client via Spark SQL? Or is there more work that needs to be done on the 
> Spark side for it to work?
> 
> Just curious.
> 
> Cheers,
> Ben
> 
> 



Spark on Kudu

2016-02-24 Thread Benjamin Kim
I see this KUDU-1214  targeted 
for 0.8.0, but I see no progress on it. When this is complete, will this mean 
that Spark will be able to work with Kudu both programmatically and as a client 
via Spark SQL? Or is there more work that needs to be done on the Spark side 
for it to work?

Just curious.

Cheers,
Ben



Re: HBase Interpreter

2016-02-23 Thread Benjamin Kim
Hi Felix,

Any updates? Does the latest merged master have the hbase quorum properties?

Thanks,
Ben


> On Feb 12, 2016, at 1:29 AM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> 
> Cool, I think I have figured out how to set properties too. I might open a PR 
> tomorrow or later.
> 
> 
> 
> 
> 
> On Thu, Feb 11, 2016 at 9:24 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote:
> 
> Hi,
> I'll take a look over the weekend. Sorry for the delay in replying. 
> 
> On Wed, Feb 10, 2016 at 6:44 AM Felix Cheung <felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> It looks like hbase-site.xml is not picked up somehow.
> 
> Rajat would you know of a way to get that set with the ruby code?
> 
> 
> _
> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
> Sent: Tuesday, February 9, 2016 2:58 PM
> 
> Subject: Re: HBase Interpreter
> To: <users@zeppelin.incubator.apache.org 
> <mailto:users@zeppelin.incubator.apache.org>>
> 
> 
> It looks like it’s not reaching the zookeeper quorum.
> 
> 16/02/09 21:52:19 ERROR client.ConnectionManager$HConnectionImplementation: 
> Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase
> 
> And the setting is:
> 
> quorum=localhost:2181
> 
> The HBase quorum is actually namenode001, namenode002, hbase-master001. Where 
> do I set this?
> 
> Thanks,
> Ben
> 
> 
> On Feb 4, 2016, at 9:15 PM, Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> We could probably look into HBase/Pom.xml handling the vendor-repo profile 
> too.
> 
> 
> 
> 
> 
> On Thu, Feb 4, 2016 at 8:08 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote: 
> 
> Benjamin,
> Can you try compiling Zeppelin by changing the dependencies in hbase/pom.xml 
> to use cloudera jars ? 
> In the long run, one option is to
> 1. run & capture o/p of 'bin/hbase classpath'
> 2. create a classloader
> 3. load all the classes from 1
> 
> Then it will work with any version of HBase theoritically.
>  
> 
> On Fri, Feb 5, 2016 at 8:14 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> Felix,
> 
> I know that Cloudera practice. We hate that they do that without informing 
> anyone.
> 
> Thanks,
> Ben
> 
> 
> 
> On Feb 4, 2016, at 9:18 AM, Felix Cheung < felixcheun...@hotmail.com 
> <mailto:felixcheun...@hotmail.com>> wrote:
> 
> CDH is known to cherry pick patches from later releases. Maybe it is because 
> of that.
> 
> Rajat do you have any lead on the release compatibility issue?
> 
> 
> _ 
> From: Rajat Venkatesh < rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> 
> Sent: Wednesday, February 3, 2016 10:05 PM 
> Subject: Re: HBase Interpreter 
> To: < users@zeppelin.incubator.apache.org 
> <mailto:users@zeppelin.incubator.apache.org>> 
> 
> 
> Oh. That should work. I've tested with 1.0.0. Hmm
> 
> On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> Hi Rajat,
> 
> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if 
> they are compatible?
> 
> Thanks,
> Ben
> 
> 
> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh < rvenkat...@qubole.com 
> <mailto:rvenkat...@qubole.com>> wrote:
> 
> Can you check the version of HBase ? HBase interpreter has been tested with 
> HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due to 
> mismatch in versions. 
> 
> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim < bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote: 
> I got this error below trying out the new HBase Interpreter after pulling and 
> compiling the latest. 
> 
> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class 
> org.apache.hadoop.hbase.quotas.ThrottleType 
> at 
> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
>  
> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51) 
> at 
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
>  
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
> at 
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
>  
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
> at 
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
>  
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
> at 
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118) 
> 
> Is there something I’m missing. Is it because I’m using CDH 5.4.8? 
> 
> Thanks, 
> Ben
> 
> 
> 
> 
> 
> 
> 



Re: Kudu Release

2016-02-23 Thread Benjamin Kim
Jean,

Very organized outline. Looking forward to the 0.7 release. I am hoping that 
most of your points are addressed and completed by 1.0 release this fall.

Thanks,
Ben


> On Feb 23, 2016, at 8:31 AM, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
> 
> Hi Ben,
> 
> Please see this thread on the dev list: 
> http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201602.mbox/%3CCAGpTDNcMBWwX8p%2ByGKzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw%40mail.gmail.com%3E
>  
> <http://mail-archives.apache.org/mod_mbox/incubator-kudu-dev/201602.mbox/%3CCAGpTDNcMBWwX8p%2ByGKzHfL2xcmKTScU-rhLcQFSns1UVSbrXhw%40mail.gmail.com%3E>
> 
> Thanks,
> 
> J-D
> 
> On Tue, Feb 23, 2016 at 8:23 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Any word as to the release roadmap?
> 
> Thanks,
> Ben
> 



Re: Cloudera and Phoenix

2016-02-21 Thread Benjamin Kim
I don’t know if Cloudera will support Phoenix going forward. There are a few 
things that lead me into thinking this.
No activity on a new port of Phoenix 4.6 or 4.7 in Cloudera Labs, as mentioned 
below
In the Cloudera Community groups, I got no reply to my question about help 
compiling Phoenix 4.7 for CDH
In the Spark Users groups, there’s active discussion about the Spark on HBase 
module that was developed by Cloudera and that it will be out in early summer.
http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/
 
<http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/>

My bet is that Cloudera is going with the Spark solution since it’s their baby, 
and it can natviely work with HBase table directly. So, this would mean that 
Phoenix is a no-go for CDH going forward? I hope not.

Cheers,
Ben


> On Feb 21, 2016, at 11:15 AM, James Taylor <jamestay...@apache.org> wrote:
> 
> Hi Dor,
> 
> Whether or not Phoenix becomes part of CDH is not under our control. It *is* 
> under your control, though (assuming you're a customer of CDH). The *only* 
> way Phoenix will transition from being in Cloudera Labs to being part of the 
> official CDH distro is if you and other customers demand it.
> 
> Thanks,
> James
> 
> On Sun, Feb 21, 2016 at 10:03 AM, Dor Ben Dov <dor.ben-...@amdocs.com 
> <mailto:dor.ben-...@amdocs.com>> wrote:
> Stephen
> 
>  
> 
> Any plans or do you or anyone where see the possibility that it will be 
> although all below as official release ?
> 
>  
> 
> Dor
> 
>  
> 
> From: Stephen Wilcoxon [mailto:wilco...@gmail.com 
> <mailto:wilco...@gmail.com>] 
> Sent: יום א 21 פברואר 2016 19:37
> To: user@phoenix.apache.org <mailto:user@phoenix.apache.org>
> Subject: Re: Cloudera and Phoenix
> 
>  
> 
> As of a few months ago, Cloudera includes Phoenix as a "lab" (basically beta) 
> but it was out-of-date.  From what I gather, the official Phoenix releases 
> will not run on Cloudera without modifications (someone was doing unofficial 
> Phoenix/Cloudera releases but I'm not sure if they still are or not).
> 
>  
> 
> On Sun, Feb 21, 2016 at 6:39 AM, Dor Ben Dov <dor.ben-...@amdocs.com 
> <mailto:dor.ben-...@amdocs.com>> wrote:
> 
> Hi All,
> 
>  
> 
> Do we have Phoenix release officially in Cloudera ? any plan to if not ?
> 
>  
> 
> Regards,
> 
>  
> 
> Dor ben Dov
> 
>  
> 
> From: Benjamin Kim [mailto:bbuil...@gmail.com <mailto:bbuil...@gmail.com>] 
> Sent: יום ו 19 פברואר 2016 19:41
> To: user@phoenix.apache.org <mailto:user@phoenix.apache.org>
> Subject: Re: Spark Phoenix Plugin
> 
>  
> 
> All,
> 
>  
> 
> Thanks for the help. I have switched out Cloudera’s HBase 1.0.0 with the 
> current Apache HBase 1.1.3. Also, I installed Phoenix 4.7.0, and everything 
> works fine except for the Phoenix Spark Plugin. I wonder if it’s a version 
> incompatibility issue with Spark 1.6. Has anyone tried compiling 4.7.0 using 
> Spark 1.6?
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
> On Feb 12, 2016, at 6:33 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> 
>  
> 
> Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
> distribution versions will it be compatible with?
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> 
>  
> 
> Hi Pierre,
> 
>  
> 
> I am getting this error now.
> 
>  
> 
> Error: org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
> 
>  
> 
> I even tried to use sqlline.py to do some queries too. It resulted in the 
> same error. I followed the installation instructions. Is there something 
> missing?
> 
>  
> 
> Thanks,
> 
> Ben
> 
>  
> 
>  
> 
> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
> <mailto:maghamraviki...@gmail.com>> wrote:
> 
>  
> 
> Hi Pierre,
> 
>  
> 
>   Try your luck for building the artifacts from 
> https://github.com/chiastic-security/phoenix-for-cloudera 
> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
> helps.
> 
>  
> 
> Regards
> 
> Ravi .
> 
>  
> 
> On Tue, Feb 9, 2016 at

Re: Spark Phoenix Plugin

2016-02-20 Thread Benjamin Kim
Josh,

My production environment at our company is:
CDH 5.4.8
Hadoop 2.6.0-cdh5.4.8
YARN 2.6.0-cdh5.4.8
HBase 1.0.0-cdh5.4.8
Apache
HBase 1.1.3
Spark 1.6.0
Phoenix 4.7.0

I tried to use the Phoenix Spark Plugin against both versions of HBase.

I hope this helps.

Thanks,
Ben


> On Feb 20, 2016, at 7:37 AM, Josh Mahonin <jmaho...@gmail.com> wrote:
> 
> Hi Ben,
> 
> Can you describe in more detail what your environment is? Are you using stock 
> installs of HBase, Spark and Phoenix? Are you using the hadoop2.4 pre-built 
> Spark distribution as per the documentation [1]?
> 
> The unread block data error is commonly traced back to this issue [2] which 
> indicates some sort of mismatched version problem..
> 
> Thanks,
> 
> Josh
> 
> [1] https://phoenix.apache.org/phoenix_spark.html 
> <https://phoenix.apache.org/phoenix_spark.html>
> [2] https://issues.apache.org/jira/browse/SPARK-1867 
> <https://issues.apache.org/jira/browse/SPARK-1867>
> 
> On Fri, Feb 19, 2016 at 2:18 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Josh,
> 
> When I run the following code in spark-shell for spark 1.6:
> 
> import org.apache.phoenix.spark._
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
> df.select(df("ID")).show()
> 
> I get this error:
> 
> java.lang.IllegalStateException: unread block data
> 
> Thanks,
> Ben
> 
> 
>> On Feb 19, 2016, at 11:12 AM, Josh Mahonin <jmaho...@gmail.com 
>> <mailto:jmaho...@gmail.com>> wrote:
>> 
>> What specifically doesn't work for you?
>> 
>> I have a Docker image that I used to do some basic testing on it with and 
>> haven't run into any problems:
>> https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
>> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
>> 
>> On Fri, Feb 19, 2016 at 12:40 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> All,
>> 
>> Thanks for the help. I have switched out Cloudera’s HBase 1.0.0 with the 
>> current Apache HBase 1.1.3. Also, I installed Phoenix 4.7.0, and everything 
>> works fine except for the Phoenix Spark Plugin. I wonder if it’s a version 
>> incompatibility issue with Spark 1.6. Has anyone tried compiling 4.7.0 using 
>> Spark 1.6?
>> 
>> Thanks,
>> Ben
>> 
>>> On Feb 12, 2016, at 6:33 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> 
>>> Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
>>> distribution versions will it be compatible with?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>>> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com 
>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>> 
>>>> Hi Pierre,
>>>> 
>>>> I am getting this error now.
>>>> 
>>>> Error: org.apache.phoenix.exception.PhoenixIOException: 
>>>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>>>> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
>>>> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
>>>> 
>>>> I even tried to use sqlline.py to do some queries too. It resulted in the 
>>>> same error. I followed the installation instructions. Is there something 
>>>> missing?
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>> 
>>>>> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
>>>>> <mailto:maghamraviki...@gmail.com>> wrote:
>>>>> 
>>>>> Hi Pierre,
>>>>> 
>>>>>   Try your luck for building the artifacts from 
>>>>> https://github.com/chiastic-security/phoenix-for-cloudera 
>>>>> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
>>>>> helps.
>>>>> 
>>>>> Regards
>>>>> Ravi .
>>>>> 
>>>>> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
>>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>>> Hi Pierre,
>>>>> 
>>>>> I found this article about how Cloudera’s version of HBase is very 
>>>>> different than Apache HBas

Re: Spark Phoenix Plugin

2016-02-19 Thread Benjamin Kim
Hi Josh,

When I run the following code in spark-shell for spark 1.6:

import org.apache.phoenix.spark._
val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
"TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
df.select(df("ID")).show()

I get this error:

java.lang.IllegalStateException: unread block data

Thanks,
Ben


> On Feb 19, 2016, at 11:12 AM, Josh Mahonin <jmaho...@gmail.com> wrote:
> 
> What specifically doesn't work for you?
> 
> I have a Docker image that I used to do some basic testing on it with and 
> haven't run into any problems:
> https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
> 
> On Fri, Feb 19, 2016 at 12:40 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> All,
> 
> Thanks for the help. I have switched out Cloudera’s HBase 1.0.0 with the 
> current Apache HBase 1.1.3. Also, I installed Phoenix 4.7.0, and everything 
> works fine except for the Phoenix Spark Plugin. I wonder if it’s a version 
> incompatibility issue with Spark 1.6. Has anyone tried compiling 4.7.0 using 
> Spark 1.6?
> 
> Thanks,
> Ben
> 
>> On Feb 12, 2016, at 6:33 AM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> 
>> Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
>> distribution versions will it be compatible with?
>> 
>> Thanks,
>> Ben
>> 
>>> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> 
>>> Hi Pierre,
>>> 
>>> I am getting this error now.
>>> 
>>> Error: org.apache.phoenix.exception.PhoenixIOException: 
>>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>>> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
>>> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
>>> 
>>> I even tried to use sqlline.py to do some queries too. It resulted in the 
>>> same error. I followed the installation instructions. Is there something 
>>> missing?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>>> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
>>>> <mailto:maghamraviki...@gmail.com>> wrote:
>>>> 
>>>> Hi Pierre,
>>>> 
>>>>   Try your luck for building the artifacts from 
>>>> https://github.com/chiastic-security/phoenix-for-cloudera 
>>>> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
>>>> helps.
>>>> 
>>>> Regards
>>>> Ravi .
>>>> 
>>>> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>> Hi Pierre,
>>>> 
>>>> I found this article about how Cloudera’s version of HBase is very 
>>>> different than Apache HBase so it must be compiled using Cloudera’s repo 
>>>> and versions. But, I’m not having any success with it.
>>>> 
>>>> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>>>>  
>>>> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
>>>> 
>>>> There’s also a Chinese site that does the same thing.
>>>> 
>>>> https://www.zybuluo.com/xtccc/note/205739 
>>>> <https://www.zybuluo.com/xtccc/note/205739>
>>>> 
>>>> I keep getting errors like the one’s below.
>>>> 
>>>> [ERROR] 
>>>> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>>>>  cannot find symbol
>>>> [ERROR] symbol:   class Region
>>>> [ERROR] location: class 
>>>> org.apache.hadoop.hbase.regionserver.LocalIndexMerger
>>>> …
>>>> 
>>>> Have you tried this also?
>>>> 
>>>> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s 
>>>> HBase.
>>>> 
>>>> Thanks,
>>>> Ben
>>>> 
>>>> 
>>>>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>>>>> <mailto:pie...@lacave.me>> wrote:
>>>>> 
>&

Re: Spark Phoenix Plugin

2016-02-19 Thread Benjamin Kim
All,

Thanks for the help. I have switched out Cloudera’s HBase 1.0.0 with the 
current Apache HBase 1.1.3. Also, I installed Phoenix 4.7.0, and everything 
works fine except for the Phoenix Spark Plugin. I wonder if it’s a version 
incompatibility issue with Spark 1.6. Has anyone tried compiling 4.7.0 using 
Spark 1.6?

Thanks,
Ben

> On Feb 12, 2016, at 6:33 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
> 
> Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
> distribution versions will it be compatible with?
> 
> Thanks,
> Ben
> 
>> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> 
>> Hi Pierre,
>> 
>> I am getting this error now.
>> 
>> Error: org.apache.phoenix.exception.PhoenixIOException: 
>> org.apache.hadoop.hbase.DoNotRetryIOException: 
>> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
>> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
>> 
>> I even tried to use sqlline.py to do some queries too. It resulted in the 
>> same error. I followed the installation instructions. Is there something 
>> missing?
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
>>> <mailto:maghamraviki...@gmail.com>> wrote:
>>> 
>>> Hi Pierre,
>>> 
>>>   Try your luck for building the artifacts from 
>>> https://github.com/chiastic-security/phoenix-for-cloudera 
>>> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
>>> helps.
>>> 
>>> Regards
>>> Ravi .
>>> 
>>> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Hi Pierre,
>>> 
>>> I found this article about how Cloudera’s version of HBase is very 
>>> different than Apache HBase so it must be compiled using Cloudera’s repo 
>>> and versions. But, I’m not having any success with it.
>>> 
>>> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>>>  
>>> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
>>> 
>>> There’s also a Chinese site that does the same thing.
>>> 
>>> https://www.zybuluo.com/xtccc/note/205739 
>>> <https://www.zybuluo.com/xtccc/note/205739>
>>> 
>>> I keep getting errors like the one’s below.
>>> 
>>> [ERROR] 
>>> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>>>  cannot find symbol
>>> [ERROR] symbol:   class Region
>>> [ERROR] location: class 
>>> org.apache.hadoop.hbase.regionserver.LocalIndexMerger
>>> …
>>> 
>>> Have you tried this also?
>>> 
>>> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s 
>>> HBase.
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>>>> <mailto:pie...@lacave.me>> wrote:
>>>> 
>>>> Havent met that one.
>>>> 
>>>> According to SPARK-1867, the real issue is hidden.
>>>> 
>>>> I d process by elimination, maybe try in local[*] mode first
>>>> 
>>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
>>>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
>>>> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>> Pierre,
>>>> 
>>>> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, 
>>>> I get this error:
>>>> 
>>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
>>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
>>>> 0.0 (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
>>>> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
>>>> java.lang.IllegalStateException: unread block data
>>>> 
>>>> It happens when I do:
>>>> 
>>>> df.show()
>>>> 
>>>> Getting closer…
>>>> 
>>>> Thanks,
&g

Re: SparkOnHBase : Which version of Spark its available

2016-02-17 Thread Benjamin Kim
Ted,

Any idea as to when this will be released?

Thanks,
Ben


> On Feb 17, 2016, at 2:53 PM, Ted Yu  wrote:
> 
> The HBASE JIRA below is for HBase 2.0
> 
> HBase Spark module would be back ported to hbase 1.3.0
> 
> FYI 
> 
> On Feb 17, 2016, at 1:13 PM, Chandeep Singh  > wrote:
> 
>> HBase-Spark module was added in 1.3
>> 
>> https://issues.apache.org/jira/browse/HBASE-13992 
>> 
>> 
>> http://blog.cloudera.com/blog/2015/08/apache-spark-comes-to-apache-hbase-with-hbase-spark-module/
>>  
>> 
>> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ 
>> 
>> 
>>> On Feb 17, 2016, at 9:44 AM, Divya Gehlot >> > wrote:
>>> 
>>> Hi,
>>> 
>>> SparkonHBase is integrated with which version of Spark and HBase ?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Thanks,
>>> Divya 
>> 



Re: Spark Phoenix Plugin

2016-02-12 Thread Benjamin Kim
Anyone know when Phoenix 4.7 will be officially released? And what Cloudera 
distribution versions will it be compatible with?

Thanks,
Ben

> On Feb 10, 2016, at 11:03 AM, Benjamin Kim <bbuil...@gmail.com> wrote:
> 
> Hi Pierre,
> 
> I am getting this error now.
> 
> Error: org.apache.phoenix.exception.PhoenixIOException: 
> org.apache.hadoop.hbase.DoNotRetryIOException: 
> SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
> org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;
> 
> I even tried to use sqlline.py to do some queries too. It resulted in the 
> same error. I followed the installation instructions. Is there something 
> missing?
> 
> Thanks,
> Ben
> 
> 
>> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com 
>> <mailto:maghamraviki...@gmail.com>> wrote:
>> 
>> Hi Pierre,
>> 
>>   Try your luck for building the artifacts from 
>> https://github.com/chiastic-security/phoenix-for-cloudera 
>> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
>> helps.
>> 
>> Regards
>> Ravi .
>> 
>> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Hi Pierre,
>> 
>> I found this article about how Cloudera’s version of HBase is very different 
>> than Apache HBase so it must be compiled using Cloudera’s repo and versions. 
>> But, I’m not having any success with it.
>> 
>> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>>  
>> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
>> 
>> There’s also a Chinese site that does the same thing.
>> 
>> https://www.zybuluo.com/xtccc/note/205739 
>> <https://www.zybuluo.com/xtccc/note/205739>
>> 
>> I keep getting errors like the one’s below.
>> 
>> [ERROR] 
>> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>>  cannot find symbol
>> [ERROR] symbol:   class Region
>> [ERROR] location: class org.apache.hadoop.hbase.regionserver.LocalIndexMerger
>> …
>> 
>> Have you tried this also?
>> 
>> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s 
>> HBase.
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>>> <mailto:pie...@lacave.me>> wrote:
>>> 
>>> Havent met that one.
>>> 
>>> According to SPARK-1867, the real issue is hidden.
>>> 
>>> I d process by elimination, maybe try in local[*] mode first
>>> 
>>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
>>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
>>> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Pierre,
>>> 
>>> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, 
>>> I get this error:
>>> 
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 
>>> 0.0 (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
>>> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
>>> java.lang.IllegalStateException: unread block data
>>> 
>>> It happens when I do:
>>> 
>>> df.show()
>>> 
>>> Getting closer…
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>> 
>>>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me 
>>>> <mailto:pie...@lacave.me>> wrote:
>>>> 
>>>> This is the wrong client jar try with the one named 
>>>> phoenix-4.7.0-HBase-1.1-client-spark.jar 
>>>> 
>>>> 
>>>> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
>>>> <mailto:bbuil...@gmail.com>> wrote:
>>>> Hi Josh,
>>>> 
>>>> I tried again by putting the settings within the spark-default.conf.
>>>> 
>>>> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>>> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>>> 
>>

Re: Spark Phoenix Plugin

2016-02-10 Thread Benjamin Kim
Hi Pierre,

I am getting this error now.

Error: org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: 
SYSTEM.CATALOG,,1453397732623.8af7b44f3d7609eb301ad98641ff2611.: 
org.apache.hadoop.hbase.client.Delete.setAttribute(Ljava/lang/String;[B)Lorg/apache/hadoop/hbase/client/Delete;

I even tried to use sqlline.py to do some queries too. It resulted in the same 
error. I followed the installation instructions. Is there something missing?

Thanks,
Ben


> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com> wrote:
> 
> Hi Pierre,
> 
>   Try your luck for building the artifacts from 
> https://github.com/chiastic-security/phoenix-for-cloudera 
> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
> helps.
> 
> Regards
> Ravi .
> 
> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Pierre,
> 
> I found this article about how Cloudera’s version of HBase is very different 
> than Apache HBase so it must be compiled using Cloudera’s repo and versions. 
> But, I’m not having any success with it.
> 
> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>  
> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
> 
> There’s also a Chinese site that does the same thing.
> 
> https://www.zybuluo.com/xtccc/note/205739 
> <https://www.zybuluo.com/xtccc/note/205739>
> 
> I keep getting errors like the one’s below.
> 
> [ERROR] 
> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>  cannot find symbol
> [ERROR] symbol:   class Region
> [ERROR] location: class org.apache.hadoop.hbase.regionserver.LocalIndexMerger
> …
> 
> Have you tried this also?
> 
> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s HBase.
> 
> Thanks,
> Ben
> 
> 
>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>> <mailto:pie...@lacave.me>> wrote:
>> 
>> Havent met that one.
>> 
>> According to SPARK-1867, the real issue is hidden.
>> 
>> I d process by elimination, maybe try in local[*] mode first
>> 
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
>> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Pierre,
>> 
>> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, I 
>> get this error:
>> 
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
>> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
>> (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
>> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
>> java.lang.IllegalStateException: unread block data
>> 
>> It happens when I do:
>> 
>> df.show()
>> 
>> Getting closer…
>> 
>> Thanks,
>> Ben
>> 
>> 
>> 
>>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me 
>>> <mailto:pie...@lacave.me>> wrote:
>>> 
>>> This is the wrong client jar try with the one named 
>>> phoenix-4.7.0-HBase-1.1-client-spark.jar 
>>> 
>>> 
>>> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Hi Josh,
>>> 
>>> I tried again by putting the settings within the spark-default.conf.
>>> 
>>> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>> 
>>> I still get the same error using the code below.
>>> 
>>> import org.apache.phoenix.spark._
>>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>>> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
>>> 
>>> Can you tell me what else you’re doing?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>>> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com 
>>>> <mailto:jmaho...@gmail.com>> wrote:
>>>> 
>>>> Hi Ben,
>>>> 
>>>> I'm not sure about the format of those command line options you're 
>>>> pa

Re: spark 1.6.0 connect to hive metastore

2016-02-09 Thread Benjamin Kim
I got the same problem when I added the Phoenix plugin jar in the driver and 
executor extra classpaths. Do you have those set too?

> On Feb 9, 2016, at 1:12 PM, Koert Kuipers  wrote:
> 
> yes its not using derby i think: i can see the tables in my actual hive 
> metastore.
> 
> i was using a symlink to /etc/hive/conf/hive-site.xml for my hive-site.xml 
> which has a lot more stuff than just hive.metastore.uris
> 
> let me try your approach
> 
> 
> 
> On Tue, Feb 9, 2016 at 3:57 PM, Alexandr Dzhagriev  > wrote:
> I'm using spark 1.6.0, hive 1.2.1 and there is just one property in the 
> hive-site.xml hive.metastore.uris Works for me. Can you check in the logs, 
> that when the HiveContext is created it connects to the correct uri and 
> doesn't use derby.
> 
> Cheers, Alex.
> 
> On Tue, Feb 9, 2016 at 9:39 PM, Koert Kuipers  > wrote:
> hey thanks. hive-site is on classpath in conf directory
> 
> i currently got it to work by changing this hive setting in hive-site.xml:
> hive.metastore.schema.verification=true
> to
> hive.metastore.schema.verification=false
> 
> this feels like a hack, because schema verification is a good thing i would 
> assume?
> 
> On Tue, Feb 9, 2016 at 3:25 PM, Alexandr Dzhagriev  > wrote:
> Hi Koert,
> 
> As far as I can see you are using derby:
> 
>  Using direct SQL, underlying DB is DERBY
> 
> not mysql, which is used for the metastore. That means, spark couldn't find 
> hive-site.xml on your classpath. Can you check that, please?
> 
> Thanks, Alex.
> 
> On Tue, Feb 9, 2016 at 8:58 PM, Koert Kuipers  > wrote:
> has anyone successfully connected to hive metastore using spark 1.6.0? i am 
> having no luck. worked fine with spark 1.5.1 for me. i am on cdh 5.5 and 
> launching spark with yarn.
> 
> this is what i see in logs:
> 16/02/09 14:49:12 INFO hive.metastore: Trying to connect to metastore with 
> URI thrift://metastore.mycompany.com:9083 
> 
> 16/02/09 14:49:12 INFO hive.metastore: Connected to metastore.
> 
> and then a little later:
> 
> 16/02/09 14:49:34 INFO hive.HiveContext: Initializing execution hive, version 
> 1.2.1
> 16/02/09 14:49:34 INFO client.ClientWrapper: Inspected Hadoop version: 
> 2.6.0-cdh5.4.4
> 16/02/09 14:49:34 INFO client.ClientWrapper: Loaded 
> org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.4.4
> 16/02/09 14:49:34 WARN conf.HiveConf: HiveConf of name 
> hive.server2.enable.impersonation does not exist
> 16/02/09 14:49:35 INFO metastore.HiveMetaStore: 0: Opening raw store with 
> implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
> 16/02/09 14:49:35 INFO metastore.ObjectStore: ObjectStore, initialize called
> 16/02/09 14:49:35 INFO DataNucleus.Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 16/02/09 14:49:35 INFO DataNucleus.Persistence: Property 
> datanucleus.cache.level2 unknown - will be ignored
> 16/02/09 14:49:35 WARN DataNucleus.Connection: BoneCP specified but not 
> present in CLASSPATH (or one of dependencies)
> 16/02/09 14:49:35 WARN DataNucleus.Connection: BoneCP specified but not 
> present in CLASSPATH (or one of dependencies)
> 16/02/09 14:49:37 WARN conf.HiveConf: HiveConf of name 
> hive.server2.enable.impersonation does not exist
> 16/02/09 14:49:37 INFO metastore.ObjectStore: Setting MetaStore object pin 
> classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 16/02/09 14:49:38 INFO DataNucleus.Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 16/02/09 14:49:38 INFO DataNucleus.Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 16/02/09 14:49:40 INFO DataNucleus.Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 16/02/09 14:49:40 INFO DataNucleus.Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 16/02/09 14:49:40 INFO metastore.MetaStoreDirectSql: Using direct SQL, 
> underlying DB is DERBY
> 16/02/09 14:49:40 INFO metastore.ObjectStore: Initialized ObjectStore
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
>   at 
> org.apache.spark.sql.hive.client.ClientWrapper.(ClientWrapper.scala:194)
>   at 
> 

Re: Spark Phoenix Plugin

2016-02-09 Thread Benjamin Kim
Hi Ravi,

I see that the version is still 4.6. Does it include the fix for the Spark 
plugin? https://issues.apache.org/jira/browse/PHOENIX-2503 
<https://issues.apache.org/jira/browse/PHOENIX-2503>

This is the main reason I need it.

Thanks,
Ben

> On Feb 9, 2016, at 10:20 AM, Ravi Kiran <maghamraviki...@gmail.com> wrote:
> 
> Hi Pierre,
> 
>   Try your luck for building the artifacts from 
> https://github.com/chiastic-security/phoenix-for-cloudera 
> <https://github.com/chiastic-security/phoenix-for-cloudera>. Hopefully it 
> helps.
> 
> Regards
> Ravi .
> 
> On Tue, Feb 9, 2016 at 10:04 AM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Pierre,
> 
> I found this article about how Cloudera’s version of HBase is very different 
> than Apache HBase so it must be compiled using Cloudera’s repo and versions. 
> But, I’m not having any success with it.
> 
> http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo
>  
> <http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo>
> 
> There’s also a Chinese site that does the same thing.
> 
> https://www.zybuluo.com/xtccc/note/205739 
> <https://www.zybuluo.com/xtccc/note/205739>
> 
> I keep getting errors like the one’s below.
> 
> [ERROR] 
> /opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
>  cannot find symbol
> [ERROR] symbol:   class Region
> [ERROR] location: class org.apache.hadoop.hbase.regionserver.LocalIndexMerger
> …
> 
> Have you tried this also?
> 
> As a last resort, we will have to abandon Cloudera’s HBase for Apache’s HBase.
> 
> Thanks,
> Ben
> 
> 
>> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me 
>> <mailto:pie...@lacave.me>> wrote:
>> 
>> Havent met that one.
>> 
>> According to SPARK-1867, the real issue is hidden.
>> 
>> I d process by elimination, maybe try in local[*] mode first
>> 
>> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
>> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
>> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Pierre,
>> 
>> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, I 
>> get this error:
>> 
>> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
>> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
>> (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
>> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
>> java.lang.IllegalStateException: unread block data
>> 
>> It happens when I do:
>> 
>> df.show()
>> 
>> Getting closer…
>> 
>> Thanks,
>> Ben
>> 
>> 
>> 
>>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me 
>>> <mailto:pie...@lacave.me>> wrote:
>>> 
>>> This is the wrong client jar try with the one named 
>>> phoenix-4.7.0-HBase-1.1-client-spark.jar 
>>> 
>>> 
>>> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Hi Josh,
>>> 
>>> I tried again by putting the settings within the spark-default.conf.
>>> 
>>> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>>> 
>>> I still get the same error using the code below.
>>> 
>>> import org.apache.phoenix.spark._
>>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>>> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
>>> 
>>> Can you tell me what else you’re doing?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>>> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com 
>>>> <mailto:jmaho...@gmail.com>> wrote:
>>>> 
>>>> Hi Ben,
>>>> 
>>>> I'm not sure about the format of those command line options you're 
>>>> passing. I've had success with spark-shell just by setting the 
>>>> 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' options 
>>>> on the spark config, as per the docs [1].
>>>> 
>>>> I'm 

Re: HBase Interpreter

2016-02-09 Thread Benjamin Kim
It looks like it’s not reaching the zookeeper quorum.

16/02/09 21:52:19 ERROR client.ConnectionManager$HConnectionImplementation: 
Can't get connection to ZooKeeper: KeeperErrorCode = ConnectionLoss for /hbase

And the setting is:

quorum=localhost:2181

The HBase quorum is actually namenode001, namenode002, hbase-master001. Where 
do I set this?

Thanks,
Ben


> On Feb 4, 2016, at 9:15 PM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> 
> We could probably look into HBase/Pom.xml handling the vendor-repo profile 
> too.
> 
> 
> 
> 
> 
> On Thu, Feb 4, 2016 at 8:08 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote:
> 
> Benjamin,
> Can you try compiling Zeppelin by changing the dependencies in hbase/pom.xml 
> to use cloudera jars ? 
> In the long run, one option is to
> 1. run & capture o/p of 'bin/hbase classpath'
> 2. create a classloader
> 3. load all the classes from 1
> 
> Then it will work with any version of HBase theoritically.
>  
> 
> On Fri, Feb 5, 2016 at 8:14 AM Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Felix,
> 
> I know that Cloudera practice. We hate that they do that without informing 
> anyone.
> 
> Thanks,
> Ben
> 
> 
> 
>> On Feb 4, 2016, at 9:18 AM, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> 
>> CDH is known to cherry pick patches from later releases. Maybe it is because 
>> of that.
>> 
>> Rajat do you have any lead on the release compatibility issue?
>> 
>> 
>> _
>> From: Rajat Venkatesh <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>>
>> Sent: Wednesday, February 3, 2016 10:05 PM
>> Subject: Re: HBase Interpreter
>> To: <users@zeppelin.incubator.apache.org 
>> <mailto:users@zeppelin.incubator.apache.org>>
>> 
>> 
>> Oh. That should work. I've tested with 1.0.0. Hmm
>> 
>> On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim < bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote: 
>> Hi Rajat,
>> 
>> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if 
>> they are compatible?
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh < rvenkat...@qubole.com 
>> <mailto:rvenkat...@qubole.com>> wrote:
>> 
>> Can you check the version of HBase ? HBase interpreter has been tested with 
>> HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due to 
>> mismatch in versions. 
>> 
>> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim < bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote: 
>> I got this error below trying out the new HBase Interpreter after pulling 
>> and compiling the latest. 
>> 
>> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class 
>> org.apache.hadoop.hbase.quotas.ThrottleType 
>> at 
>> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
>>  
>> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118)
>>  
>> 
>> Is there something I’m missing. Is it because I’m using CDH 5.4.8? 
>> 
>> Thanks, 
>> Ben
>> 
>> 
>> 
> 



Re: Spark Phoenix Plugin

2016-02-09 Thread Benjamin Kim
Hi Pierre,

I found this article about how Cloudera’s version of HBase is very different 
than Apache HBase so it must be compiled using Cloudera’s repo and versions. 
But, I’m not having any success with it.

http://stackoverflow.com/questions/31849454/using-phoenix-with-cloudera-hbase-installed-from-repo

There’s also a Chinese site that does the same thing.

https://www.zybuluo.com/xtccc/note/205739

I keep getting errors like the one’s below.

[ERROR] 
/opt/tools/phoenix/phoenix-core/src/main/java/org/apache/hadoop/hbase/regionserver/LocalIndexMerger.java:[110,29]
 cannot find symbol
[ERROR] symbol:   class Region
[ERROR] location: class org.apache.hadoop.hbase.regionserver.LocalIndexMerger
…

Have you tried this also?

As a last resort, we will have to abandon Cloudera’s HBase for Apache’s HBase.

Thanks,
Ben


> On Feb 8, 2016, at 11:04 PM, pierre lacave <pie...@lacave.me> wrote:
> 
> Havent met that one.
> 
> According to SPARK-1867, the real issue is hidden.
> 
> I d process by elimination, maybe try in local[*] mode first
> 
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867 
> <https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-1867>
> On Tue, 9 Feb 2016, 04:58 Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Pierre,
> 
> I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, I 
> get this error:
> 
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, prod-dc1-datanode151.pdc1i.gradientx.com 
> <http://prod-dc1-datanode151.pdc1i.gradientx.com/>): 
> java.lang.IllegalStateException: unread block data
> 
> It happens when I do:
> 
> df.show()
> 
> Getting closer…
> 
> Thanks,
> Ben
> 
> 
> 
>> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me 
>> <mailto:pie...@lacave.me>> wrote:
>> 
>> This is the wrong client jar try with the one named 
>> phoenix-4.7.0-HBase-1.1-client-spark.jar 
>> 
>> 
>> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Hi Josh,
>> 
>> I tried again by putting the settings within the spark-default.conf.
>> 
>> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
>> 
>> I still get the same error using the code below.
>> 
>> import org.apache.phoenix.spark._
>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
>> 
>> Can you tell me what else you’re doing?
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com 
>>> <mailto:jmaho...@gmail.com>> wrote:
>>> 
>>> Hi Ben,
>>> 
>>> I'm not sure about the format of those command line options you're passing. 
>>> I've had success with spark-shell just by setting the 
>>> 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' options 
>>> on the spark config, as per the docs [1].
>>> 
>>> I'm not sure if there's anything special needed for CDH or not though. I 
>>> also have a docker image I've been toying with which has a working 
>>> Spark/Phoenix setup using the Phoenix 4.7.0 RC and Spark 1.6.0. It might be 
>>> a useful reference for you as well [2].
>>> 
>>> Good luck,
>>> 
>>> Josh
>>> 
>>> [1] https://phoenix.apache.org/phoenix_spark.html 
>>> <https://phoenix.apache.org/phoenix_spark.html>
>>> [2] https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
>>> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
>>> 
>>> On Mon, Feb 8, 2016 at 4:29 PM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Hi Pierre,
>>> 
>>> I tried to run in spark-shell using spark 1.6.0 by running this:
>>> 
>>> spark-shell --master yarn-client --driver-class-path 
>>> /opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar --driver-java-options 
>>> "-Dspark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar”
>>> 
>>> The version of HBase is the one in CDH5.4.8, which is 1.0.0-cdh5.4.8.
>>> 
>>> When I get to the line:
>>> 
>>> val df = sql

Re: Spark Phoenix Plugin

2016-02-08 Thread Benjamin Kim
Pierre,

I got it to work using phoenix-4.7.0-HBase-1.0-client-spark.jar. But, now, I 
get this error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
3, prod-dc1-datanode151.pdc1i.gradientx.com): java.lang.IllegalStateException: 
unread block data

It happens when I do:

df.show()

Getting closer…

Thanks,
Ben



> On Feb 8, 2016, at 2:57 PM, pierre lacave <pie...@lacave.me> wrote:
> 
> This is the wrong client jar try with the one named 
> phoenix-4.7.0-HBase-1.1-client-spark.jar 
> 
> 
> On Mon, 8 Feb 2016, 22:29 Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Josh,
> 
> I tried again by putting the settings within the spark-default.conf.
> 
> spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
> spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
> 
> I still get the same error using the code below.
> 
> import org.apache.phoenix.spark._
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))
> 
> Can you tell me what else you’re doing?
> 
> Thanks,
> Ben
> 
> 
>> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com 
>> <mailto:jmaho...@gmail.com>> wrote:
>> 
>> Hi Ben,
>> 
>> I'm not sure about the format of those command line options you're passing. 
>> I've had success with spark-shell just by setting the 
>> 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' options on 
>> the spark config, as per the docs [1].
>> 
>> I'm not sure if there's anything special needed for CDH or not though. I 
>> also have a docker image I've been toying with which has a working 
>> Spark/Phoenix setup using the Phoenix 4.7.0 RC and Spark 1.6.0. It might be 
>> a useful reference for you as well [2].
>> 
>> Good luck,
>> 
>> Josh
>> 
>> [1] https://phoenix.apache.org/phoenix_spark.html 
>> <https://phoenix.apache.org/phoenix_spark.html>
>> [2] https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
>> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
>> 
>> On Mon, Feb 8, 2016 at 4:29 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Hi Pierre,
>> 
>> I tried to run in spark-shell using spark 1.6.0 by running this:
>> 
>> spark-shell --master yarn-client --driver-class-path 
>> /opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar --driver-java-options 
>> "-Dspark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar”
>> 
>> The version of HBase is the one in CDH5.4.8, which is 1.0.0-cdh5.4.8.
>> 
>> When I get to the line:
>> 
>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>> “TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181”))
>> 
>> I get this error:
>> 
>> java.lang.NoClassDefFoundError: Could not initialize class 
>> org.apache.spark.rdd.RDDOperationScope$
>> 
>> Any ideas?
>> 
>> Thanks,
>> Ben
>> 
>> 
>>> On Feb 5, 2016, at 1:36 PM, pierre lacave <pie...@lacave.me 
>>> <mailto:pie...@lacave.me>> wrote:
>>> 
>>> I don't know when the full release will be, RC1 just got pulled out, and 
>>> expecting RC2 soon
>>> 
>>> you can find them here 
>>> 
>>> https://dist.apache.org/repos/dist/dev/phoenix/ 
>>> <https://dist.apache.org/repos/dist/dev/phoenix/>
>>> 
>>> 
>>> there is a new phoenix-4.7.0-HBase-1.1-client-spark.jar that is all you 
>>> need to have in spark classpath
>>> 
>>> 
>>> Pierre Lacave
>>> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
>>> Phone :   +353879128708 <tel:%2B353879128708>
>>> 
>>> On Fri, Feb 5, 2016 at 9:28 PM, Benjamin Kim <bbuil...@gmail.com 
>>> <mailto:bbuil...@gmail.com>> wrote:
>>> Hi Pierre,
>>> 
>>> When will I be able to download this version?
>>> 
>>> Thanks,
>>> Ben
>>> 
>>> 
>>> On Friday, February 5, 2016, pierre lacave <pie...@lacave.me 
>>> <mailto:pie...@lacave.me>> wrote:
>>> This was addressed in Phoenix 4.7 (currently in RC) 
>>> https://issues.apac

Re: Spark Phoenix Plugin

2016-02-08 Thread Benjamin Kim
Hi Pierre,

I tried to run in spark-shell using spark 1.6.0 by running this:

spark-shell --master yarn-client --driver-class-path 
/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar --driver-java-options 
"-Dspark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar”

The version of HBase is the one in CDH5.4.8, which is 1.0.0-cdh5.4.8.

When I get to the line:

val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
“TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181”))

I get this error:

java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.spark.rdd.RDDOperationScope$

Any ideas?

Thanks,
Ben


> On Feb 5, 2016, at 1:36 PM, pierre lacave <pie...@lacave.me> wrote:
> 
> I don't know when the full release will be, RC1 just got pulled out, and 
> expecting RC2 soon
> 
> you can find them here 
> 
> https://dist.apache.org/repos/dist/dev/phoenix/ 
> <https://dist.apache.org/repos/dist/dev/phoenix/>
> 
> 
> there is a new phoenix-4.7.0-HBase-1.1-client-spark.jar that is all you need 
> to have in spark classpath
> 
> 
> Pierre Lacave
> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
> Phone :   +353879128708
> 
> On Fri, Feb 5, 2016 at 9:28 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Pierre,
> 
> When will I be able to download this version?
> 
> Thanks,
> Ben
> 
> 
> On Friday, February 5, 2016, pierre lacave <pie...@lacave.me 
> <mailto:pie...@lacave.me>> wrote:
> This was addressed in Phoenix 4.7 (currently in RC) 
> https://issues.apache.org/jira/browse/PHOENIX-2503 
> <https://issues.apache.org/jira/browse/PHOENIX-2503>
> 
> 
> 
> 
> Pierre Lacave
> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
> Phone :   +353879128708 <tel:%2B353879128708>
> 
> On Fri, Feb 5, 2016 at 6:17 PM, Benjamin Kim <bbuil...@gmail.com <>> wrote:
> I cannot get this plugin to work in CDH 5.4.8 using Phoenix 4.5.2 and Spark 
> 1.6. When I try to launch spark-shell, I get:
> 
> java.lang.RuntimeException: java.lang.RuntimeException: Unable to 
> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
> 
> I continue on and run the example code. When I get tot the line below:
> 
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "TEST.MY_TEST", "zkUrl" -> "zookeeper1,zookeeper2,zookeeper3:2181")
> 
> I get this error:
> 
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
> 
> Can someone help?
> 
> Thanks,
> Ben
> 
> 



Re: Spark Phoenix Plugin

2016-02-08 Thread Benjamin Kim
Hi Josh,

I tried again by putting the settings within the spark-default.conf.

spark.driver.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar
spark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar

I still get the same error using the code below.

import org.apache.phoenix.spark._
val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
"TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181"))

Can you tell me what else you’re doing?

Thanks,
Ben


> On Feb 8, 2016, at 1:44 PM, Josh Mahonin <jmaho...@gmail.com> wrote:
> 
> Hi Ben,
> 
> I'm not sure about the format of those command line options you're passing. 
> I've had success with spark-shell just by setting the 
> 'spark.executor.extraClassPath' and 'spark.driver.extraClassPath' options on 
> the spark config, as per the docs [1].
> 
> I'm not sure if there's anything special needed for CDH or not though. I also 
> have a docker image I've been toying with which has a working Spark/Phoenix 
> setup using the Phoenix 4.7.0 RC and Spark 1.6.0. It might be a useful 
> reference for you as well [2].
> 
> Good luck,
> 
> Josh
> 
> [1] https://phoenix.apache.org/phoenix_spark.html 
> <https://phoenix.apache.org/phoenix_spark.html>
> [2] https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark 
> <https://github.com/jmahonin/docker-phoenix/tree/phoenix_spark>
> 
> On Mon, Feb 8, 2016 at 4:29 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Hi Pierre,
> 
> I tried to run in spark-shell using spark 1.6.0 by running this:
> 
> spark-shell --master yarn-client --driver-class-path 
> /opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar --driver-java-options 
> "-Dspark.executor.extraClassPath=/opt/tools/phoenix/phoenix-4.7.0-HBase-1.0-client.jar”
> 
> The version of HBase is the one in CDH5.4.8, which is 1.0.0-cdh5.4.8.
> 
> When I get to the line:
> 
> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> “TEST.MY_TEST", "zkUrl" -> “zk1,zk2,zk3:2181”))
> 
> I get this error:
> 
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.spark.rdd.RDDOperationScope$
> 
> Any ideas?
> 
> Thanks,
> Ben
> 
> 
>> On Feb 5, 2016, at 1:36 PM, pierre lacave <pie...@lacave.me 
>> <mailto:pie...@lacave.me>> wrote:
>> 
>> I don't know when the full release will be, RC1 just got pulled out, and 
>> expecting RC2 soon
>> 
>> you can find them here 
>> 
>> https://dist.apache.org/repos/dist/dev/phoenix/ 
>> <https://dist.apache.org/repos/dist/dev/phoenix/>
>> 
>> 
>> there is a new phoenix-4.7.0-HBase-1.1-client-spark.jar that is all you need 
>> to have in spark classpath
>> 
>> 
>> Pierre Lacave
>> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
>> Phone :   +353879128708 <tel:%2B353879128708>
>> 
>> On Fri, Feb 5, 2016 at 9:28 PM, Benjamin Kim <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> Hi Pierre,
>> 
>> When will I be able to download this version?
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Friday, February 5, 2016, pierre lacave <pie...@lacave.me 
>> <mailto:pie...@lacave.me>> wrote:
>> This was addressed in Phoenix 4.7 (currently in RC) 
>> https://issues.apache.org/jira/browse/PHOENIX-2503 
>> <https://issues.apache.org/jira/browse/PHOENIX-2503>
>> 
>> 
>> 
>> 
>> Pierre Lacave
>> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
>> Phone :   +353879128708 <tel:%2B353879128708>
>> 
>> On Fri, Feb 5, 2016 at 6:17 PM, Benjamin Kim <bbuil...@gmail.com <>> wrote:
>> I cannot get this plugin to work in CDH 5.4.8 using Phoenix 4.5.2 and Spark 
>> 1.6. When I try to launch spark-shell, I get:
>> 
>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to 
>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>> 
>> I continue on and run the example code. When I get tot the line below:
>> 
>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
>> "TEST.MY_TEST", "zkUrl" -> "zookeeper1,zookeeper2,zookeeper3:2181")
>> 
>> I get this error:
>> 
>> java.lang.NoSuchMethodError: 
>> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
>> 
>> Can someone help?
>> 
>> Thanks,
>> Ben
>> 
>> 
> 
> 



Re: Spark Phoenix Plugin

2016-02-05 Thread Benjamin Kim
Hi Pierre,

When will I be able to download this version?

Thanks,
Ben

On Friday, February 5, 2016, pierre lacave <pie...@lacave.me> wrote:

> This was addressed in Phoenix 4.7 (currently in RC)
> https://issues.apache.org/jira/browse/PHOENIX-2503
>
>
>
>
> *Pierre Lacave*
> 171 Skellig House, Custom House, Lower Mayor street, Dublin 1, Ireland
> Phone :   +353879128708
>
> On Fri, Feb 5, 2016 at 6:17 PM, Benjamin Kim <bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>> wrote:
>
>> I cannot get this plugin to work in CDH 5.4.8 using Phoenix 4.5.2 and
>> Spark 1.6. When I try to launch spark-shell, I get:
>>
>> java.lang.RuntimeException: java.lang.RuntimeException: Unable to
>> instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
>>
>> I continue on and run the example code. When I get tot the line below:
>>
>> val df = sqlContext.load("org.apache.phoenix.spark", Map("table"
>> -> "TEST.MY_TEST", "zkUrl" -> "zookeeper1,zookeeper2,zookeeper3:2181")
>>
>> I get this error:
>>
>> java.lang.NoSuchMethodError:
>> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer$.handledType()Ljava/lang/Class;
>>
>> Can someone help?
>>
>> Thanks,
>> Ben
>
>
>


Re: HBase Interpreter

2016-02-04 Thread Benjamin Kim
Please, tell me what values to put as the properties for hbase version?

Thanks,
Ben


> On Feb 4, 2016, at 9:15 PM, Felix Cheung <felixcheun...@hotmail.com> wrote:
> 
> We could probably look into HBase/Pom.xml handling the vendor-repo profile 
> too.
> 
> 
> 
> 
> 
> On Thu, Feb 4, 2016 at 8:08 PM -0800, "Rajat Venkatesh" 
> <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>> wrote:
> 
> Benjamin,
> Can you try compiling Zeppelin by changing the dependencies in hbase/pom.xml 
> to use cloudera jars ? 
> In the long run, one option is to
> 1. run & capture o/p of 'bin/hbase classpath'
> 2. create a classloader
> 3. load all the classes from 1
> 
> Then it will work with any version of HBase theoritically.
>  
> 
> On Fri, Feb 5, 2016 at 8:14 AM Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Felix,
> 
> I know that Cloudera practice. We hate that they do that without informing 
> anyone.
> 
> Thanks,
> Ben
> 
> 
> 
>> On Feb 4, 2016, at 9:18 AM, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> 
>> CDH is known to cherry pick patches from later releases. Maybe it is because 
>> of that.
>> 
>> Rajat do you have any lead on the release compatibility issue?
>> 
>> 
>> _
>> From: Rajat Venkatesh <rvenkat...@qubole.com <mailto:rvenkat...@qubole.com>>
>> Sent: Wednesday, February 3, 2016 10:05 PM
>> Subject: Re: HBase Interpreter
>> To: <users@zeppelin.incubator.apache.org 
>> <mailto:users@zeppelin.incubator.apache.org>>
>> 
>> 
>> Oh. That should work. I've tested with 1.0.0. Hmm
>> 
>> On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim < bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote: 
>> Hi Rajat,
>> 
>> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if 
>> they are compatible?
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh < rvenkat...@qubole.com 
>> <mailto:rvenkat...@qubole.com>> wrote:
>> 
>> Can you check the version of HBase ? HBase interpreter has been tested with 
>> HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due to 
>> mismatch in versions. 
>> 
>> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim < bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote: 
>> I got this error below trying out the new HBase Interpreter after pulling 
>> and compiling the latest. 
>> 
>> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class 
>> org.apache.hadoop.hbase.quotas.ThrottleType 
>> at 
>> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
>>  
>> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
>>  
>> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062) 
>> at 
>> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118)
>>  
>> 
>> Is there something I’m missing. Is it because I’m using CDH 5.4.8? 
>> 
>> Thanks, 
>> Ben
>> 
>> 
>> 
> 



Re: HBase Interpreter

2016-02-04 Thread Benjamin Kim
Sure. I'll do that.

On Thursday, February 4, 2016, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Sorry to clarify I was referring to changes to Maven's project file that
> could allow changes to dependencies at project build time. We would need to
> look into that - will loop you in for validating for sure if you'd like.
>
>
>
> _________
> From: Benjamin Kim <bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>>
> Sent: Thursday, February 4, 2016 9:39 PM
> Subject: Re: HBase Interpreter
> To: <users@zeppelin.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','users@zeppelin.incubator.apache.org');>>
>
>
> Please, tell me what values to put as the properties for hbase version?
>
> Thanks,
> Ben
>
>
> On Feb 4, 2016, at 9:15 PM, Felix Cheung < felixcheun...@hotmail.com
> <javascript:_e(%7B%7D,'cvml','felixcheun...@hotmail.com');>> wrote:
>
> We could probably look into HBase/Pom.xml handling the vendor-repo profile
> too.
>
>
>
>
>
> On Thu, Feb 4, 2016 at 8:08 PM -0800, "Rajat Venkatesh" <
> rvenkat...@qubole.com
> <javascript:_e(%7B%7D,'cvml','rvenkat...@qubole.com');>> wrote:
>
> Benjamin,
> Can you try compiling Zeppelin by changing the dependencies in
> hbase/pom.xml to use cloudera jars ?
> In the long run, one option is to
> 1. run & capture o/p of 'bin/hbase classpath'
> 2. create a classloader
> 3. load all the classes from 1
>
> Then it will work with any version of HBase theoritically.
>
>
> On Fri, Feb 5, 2016 at 8:14 AM Benjamin Kim < bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>> wrote:
>
> Felix,
>
> I know that Cloudera practice. We hate that they do that without informing
> anyone.
>
> Thanks,
> Ben
>
>
>
> On Feb 4, 2016, at 9:18 AM, Felix Cheung < felixcheun...@hotmail.com
> <javascript:_e(%7B%7D,'cvml','felixcheun...@hotmail.com');>> wrote:
>
> CDH is known to cherry pick patches from later releases. Maybe it is
> because of that.
>
> Rajat do you have any lead on the release compatibility issue?
>
>
> _
> From: Rajat Venkatesh < rvenkat...@qubole.com
> <javascript:_e(%7B%7D,'cvml','rvenkat...@qubole.com');>>
> Sent: Wednesday, February 3, 2016 10:05 PM
> Subject: Re: HBase Interpreter
> To: < users@zeppelin.incubator.apache.org
> <javascript:_e(%7B%7D,'cvml','users@zeppelin.incubator.apache.org');>>
>
>
> Oh. That should work. I've tested with 1.0.0. Hmm
>
> On Thu, Feb 4, 2016 at 10:50 AM Benjamin Kim < bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>> wrote:
>
> Hi Rajat,
>
> The version of HBase that comes with CDH 5.4.8 is 1.0.0. How do I check if
> they are compatible?
>
> Thanks,
> Ben
>
>
> On Feb 3, 2016, at 9:16 PM, Rajat Venkatesh < rvenkat...@qubole.com
> <javascript:_e(%7B%7D,'cvml','rvenkat...@qubole.com');>> wrote:
>
> Can you check the version of HBase ? HBase interpreter has been tested
> with HBase 1.0.x and Hadoop 2.6.0. There is a good chance this error is due
> to mismatch in versions.
>
> On Thu, Feb 4, 2016 at 10:20 AM Benjamin Kim < bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>> wrote:
>
> I got this error below trying out the new HBase Interpreter after pulling
> and compiling the latest.
>
> org.jruby.exceptions.RaiseException: (NameError) cannot load Java class
> org.apache.hadoop.hbase.quotas.ThrottleType
> at
> org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
>
> at (Anonymous).method_missing(/builtin/javasupport/java.rb:51)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:23)
>
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/quotas.rb:24)
>
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase/hbase.rb:90)
>
> at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
> at
> (Anonymous).(root)(/opt/cloudera/parcels/CDH/lib/hbase/lib/ruby/hbase.rb:118)
>
>
> Is there something I’m missing. Is it because I’m using CDH 5.4.8?
>
> Thanks,
> Ben
>
>
>
>
>
>
>
>
>


Re: zeppelin multi user mode?

2016-02-03 Thread Benjamin Kim
I forgot to mention that I don’t see Spark 1.6 in the list of versions when 
installing z-manager.

> On Feb 3, 2016, at 10:08 PM, Corneau Damien <cornead...@gmail.com> wrote:
> 
> @Benjamin,
> We do support version 1.6 of Spark, see: 
> https://github.com/apache/incubator-zeppelin#spark-interpreter 
> <https://github.com/apache/incubator-zeppelin#spark-interpreter>
> 
> On Wed, Feb 3, 2016 at 9:47 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> I see that the latest version of Spark supported is 1.4.1. When will the 
> latest versions of Spark be supported?
> 
> Thanks,
> Ben
> 
> 
>> On Feb 3, 2016, at 7:54 PM, Hyung Sung Shim <hss...@nflabs.com 
>> <mailto:hss...@nflabs.com>> wrote:
>> 
>> Hello yunfeng.
>> 
>> You can also refer to 
>> https://github.com/NFLabs/z-manager/tree/master/multitenancy 
>> <https://github.com/NFLabs/z-manager/tree/master/multitenancy>.
>> 
>> Thanks. 
>> 
>> 2016-02-04 3:56 GMT+09:00 Christopher Matta <cma...@mapr.com 
>> <mailto:cma...@mapr.com>>:
>> I have had luck with a single Zepplin installation and  config directories 
>> in each user home directory. That way each user gets their own instance and 
>> will not interfere with each other. 
>> 
>> You can start the Zepplin server with a config flag pointing to the config 
>> directory. Simply copy the config dir that comes with Zepplin to ~/.zeppelin 
>> and edit the zeppelin-site.xml to change default port for each user. Start 
>> like this: 
>> ./zeppelin.sh --config ~/.zeppelin start
>> 
>> 
>> On Wednesday, February 3, 2016, Lin, Yunfeng <yunfeng@citi.com 
>> <mailto:yunfeng@citi.com>> wrote:
>> Hi guys,
>> 
>>  
>> 
>> We are planning to use zeppelin for PROD for data scientists. One feature we 
>> desperately need is multi user mode.
>> 
>>  
>> 
>> Currently, zeppelin is great for single user use. However, since zeppelin 
>> spark context are shared among all users in one zeppelin server, it is not 
>> very suitable when there are multiple users on the same zeppelin server 
>> since they are going to interfere with each other in one spark context.
>> 
>>  
>> 
>> How do you guys address this need? Thanks.
>> 
>>  
>> 
>> 
>> 
>> -- 
>> Chris Matta
>> cma...@mapr.com <mailto:cma...@mapr.com>
>> 215-701-3146 
>> 
> 
> 



Re: Is there a any plan to develop SPARK with c++??

2016-02-03 Thread Benjamin Kim
Hi DaeJin,

The closest thing I can think of is this.

https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

Cheers,
Ben

> On Feb 3, 2016, at 9:49 PM, DaeJin Jung  wrote:
> 
> hello everyone,
> I have a short question.
> 
> I would like to improve performance for SPARK framework using intel native 
> instruction or etc.. So, I wonder if there are any plans to develop SPARK 
> with C++ or C in the near future.
> 
> Please let me know if you have any informantion.
> 
> Best Regards,
> Daejin Jung



Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

2016-02-03 Thread Benjamin Kim
Same here. I want to know the answer too.


> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly  wrote:
> 
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I was 
> doing something wrong or what. Glad to know it's not just me! Unfortunately I 
> have not yet had the time to look any deeper into it. Would you mind filing a 
> JIRA if there isn't already one?
> 
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng  > wrote:
> Hi guys,
> 
>  
> 
> I load spark-csv dependencies in %spark, but not in %sql using apache 
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 
> with spark 1.5 through
> 
>  
> 
> Do you have similar problems?
> 
>  
> 
> I am loading spark csv dependencies (https://github.com/databricks/spark-csv 
> )
> 
>  
> 
> Using:
> 
> %dep
> 
> z.load(“PATH/commons-csv-1.1.jar”)
> 
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> 
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
> 
> z.load(“PATH/scala-library-2.10.5.jar”)
> 
>  
> 
> I am able to load a csv from hdfs using data frame API in spark. It is 
> running perfect fine.
> 
> %spark
> 
> val df = sqlContext.read
> 
> .format("com.databricks.spark.csv")
> 
> .option("header", "false") // Use finrst line of all files as header
> 
> .option("inferSchema", "true") // Automatically infer data types
> 
> .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in 
> HDFS
> 
> df.registerTempTable("people")
> 
> df.show()
> 
>  
> 
> This also work:
> 
> %spark
> 
> val df2=sqlContext.sql(“select * from people”)
> 
> df2.show()
> 
>  
> 
> But this doesn’t work….
> 
> %sql
> 
> select * from people
> 
>  
> 
> java.lang.ClassNotFoundException: 
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> java.lang.Class.forName0(Native Method) at 
> java.lang.Class.forName(Class.java:270) at 
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
>  at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
>  at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
>  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at 
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at 
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at 
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at 
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at 
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>  at 
> 

Re: Spark with SAS

2016-02-03 Thread Benjamin Kim
You can download the Spark ODBC Driver.

https://databricks.com/spark/odbc-driver-download


> On Feb 3, 2016, at 10:09 AM, Jörn Franke  wrote:
> 
> This could be done through odbc. Keep in mind that you can run SaS jobs 
> directly on a Hadoop cluster using the SaS embedded process engine or dump 
> some data to SaS lasr cluster, but you better ask SaS about this.
> 
>> On 03 Feb 2016, at 18:43, Sourav Mazumder  
>> wrote:
>> 
>> Hi,
>> 
>> Is anyone aware of any work going on for integrating Spark with SAS for 
>> executing queries in Spark?
>> 
>> For example calling Spark Jobs from SAS using Spark SQL through Spark SQL's 
>> JDBC/ODBC library.
>> 
>> Regards,
>> Sourav
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: csv dependencies loaded in %spark but not %sql in spark 1.6/zeppelin 0.5.6

2016-02-02 Thread Benjamin Kim
Same here. I want to know the answer too.


> On Feb 2, 2016, at 12:32 PM, Jonathan Kelly  wrote:
> 
> Hey, I just ran into that same exact issue yesterday and wasn't sure if I was 
> doing something wrong or what. Glad to know it's not just me! Unfortunately I 
> have not yet had the time to look any deeper into it. Would you mind filing a 
> JIRA if there isn't already one?
> 
> On Tue, Feb 2, 2016 at 12:29 PM Lin, Yunfeng  > wrote:
> Hi guys,
> 
>  
> 
> I load spark-csv dependencies in %spark, but not in %sql using apache 
> zeppelin 0.5.6 with spark 1.6.0. Everything is working fine in zeppelin 0.5.5 
> with spark 1.5 through
> 
>  
> 
> Do you have similar problems?
> 
>  
> 
> I am loading spark csv dependencies (https://github.com/databricks/spark-csv 
> )
> 
>  
> 
> Using:
> 
> %dep
> 
> z.load(“PATH/commons-csv-1.1.jar”)
> 
> z.load(“PATH/spark-csv_2.10-1.3.0.jar”)
> 
> z.load(“PATH/univocity-parsers-1.5.1.jar:)
> 
> z.load(“PATH/scala-library-2.10.5.jar”)
> 
>  
> 
> I am able to load a csv from hdfs using data frame API in spark. It is 
> running perfect fine.
> 
> %spark
> 
> val df = sqlContext.read
> 
> .format("com.databricks.spark.csv")
> 
> .option("header", "false") // Use finrst line of all files as header
> 
> .option("inferSchema", "true") // Automatically infer data types
> 
> .load("hdfs://sd-6f48-7fe6:8020/tmp/people.txt")   // this is a file in 
> HDFS
> 
> df.registerTempTable("people")
> 
> df.show()
> 
>  
> 
> This also work:
> 
> %spark
> 
> val df2=sqlContext.sql(“select * from people”)
> 
> df2.show()
> 
>  
> 
> But this doesn’t work….
> 
> %sql
> 
> select * from people
> 
>  
> 
> java.lang.ClassNotFoundException: 
> com.databricks.spark.csv.CsvRelation$$anonfun$1$$anonfun$2 at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> java.lang.Class.forName0(Native Method) at 
> java.lang.Class.forName(Class.java:270) at 
> org.apache.spark.util.InnerClosureFinder$$anon$4.visitMethodInsn(ClosureCleaner.scala:435)
>  at org.apache.xbean.asm5.ClassReader.a(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.b(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> org.apache.xbean.asm5.ClassReader.accept(Unknown Source) at 
> org.apache.spark.util.ClosureCleaner$.getInnerClosureClasses(ClosureCleaner.scala:84)
>  at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:187)
>  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122) at 
> org.apache.spark.SparkContext.clean(SparkContext.scala:2055) at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:707) at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:706) at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at 
> org.apache.spark.rdd.RDD.mapPartitions(RDD.scala:706) at 
> com.databricks.spark.csv.CsvRelation.tokenRdd(CsvRelation.scala:90) at 
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:104) at 
> com.databricks.spark.csv.CsvRelation.buildScan(CsvRelation.scala:152) at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$4.apply(DataSourceStrategy.scala:64)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:274)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:273)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProjectRaw(DataSourceStrategy.scala:352)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.pruneFilterProject(DataSourceStrategy.scala:269)
>  at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:60)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>  at 
> 

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread Benjamin Kim
Hi David,

My company uses Lamba to do simple data moving and processing using python 
scripts. I can see using Spark instead for the data processing would make it 
into a real production level platform. Does this pave the way into replacing 
the need of a pre-instantiated cluster in AWS or bought hardware in a 
datacenter? If so, then this would be a great efficiency and make an easier 
entry point for Spark usage. I hope the vision is to get rid of all cluster 
management when using Spark.

Thanks,
Ben


> On Feb 1, 2016, at 4:23 AM, David Russell  wrote:
> 
> Hi all,
> 
> Just sharing news of the release of a newly available Spark package, SAMBA 
> . 
> 
> 
> https://github.com/onetapbeyond/lambda-spark-executor 
> 
> 
> SAMBA is an Apache Spark package offering seamless integration with the AWS 
> Lambda  compute service for Spark batch and 
> streaming applications on the JVM.
> 
> Within traditional Spark deployments RDD tasks are executed using fixed 
> compute resources on worker nodes within the Spark cluster. With SAMBA, 
> application developers can delegate selected RDD tasks to execute using 
> on-demand AWS Lambda compute infrastructure in the cloud.
> 
> Not unlike the recently released ROSE 
>  package that extends 
> the capabilities of traditional Spark applications with support for CRAN R 
> analytics, SAMBA provides another (hopefully) useful extension for Spark 
> application developers on the JVM.
> 
> SAMBA Spark Package: https://github.com/onetapbeyond/lambda-spark-executor 
> 
> ROSE Spark Package: https://github.com/onetapbeyond/opencpu-spark-executor 
> 
> 
> Questions, suggestions, feedback welcome.
> 
> David
> 
> -- 
> "All that is gold does not glitter, Not all those who wander are lost."



Re: Upgrade spark to 1.6.0

2016-02-01 Thread Benjamin Kim
Hi Felix,

After installing Spark 1.6, I built Zeppelin using:

mvn clean package -Pspark-1.6 -Dspark.version=1.6.0 
-Dhadoop.version=2.6.0-cdh5.4.8 -Phadoop-2.6 -Pyarn -Ppyspark -Pvendor-repo 
-DskipTests

This worked for me.

Cheers,
Ben


> On Feb 1, 2016, at 7:44 PM, Felix Cheung  wrote:
> 
> Hi
> 
> You can see the build command line example here for spark 1.6 profile
> 
> https://github.com/apache/incubator-zeppelin/blob/master/README.md
> 
> 
> 
> 
> 
> On Mon, Feb 1, 2016 at 3:59 PM -0800, "Daniel Valdivia" 
> > wrote:
> 
> Hi,
> 
> I'd like to ask if there's an easy way to upgrade spark to 1.6.0 from the 
> current 1.4.x that's bundled with the current release of zepellin, would 
> updating the pom.xml and compiling suffice ?
> 
> Cheers



Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-26 Thread Benjamin Kim
Chris,

I have a question about your setup. Does it allow the same usage of 
Cassandra/HBase data sources? Can I create a table that links to and be used by 
Spark SQL? The reason for asking is that I see the Cassandra connector package 
included in your script.

Thanks,
Ben

> On Dec 25, 2015, at 6:41 AM, Chris Fregly <ch...@fregly.com> wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs 
> to be on the Java System Classpath per this 
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting>
>  troubleshooting section in the Spark SQL programming guide.
> 
> Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh>
>  is an example hive-thrift-server start script from my Spark-based reference 
> pipeline project.  Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh>
>  is an example script that decorates the out-of-the-box spark-sql command to 
> use the MySQL JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined 
> here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144>
>  and here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87>
>  and includes the path to the local MySQL JDBC driver.  This approach is 
> described here 
> <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
>  in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the 
> cluster - specifically in the work directory for each SparkContext for 
> isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, 
> and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, 
> but I couldn't get this to work for whatever reason, so i'm sticking to the 
> --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in 
> spark-defaults.conf to all the worker nodes? Do I need to do the same with 
> the PostgreSQL driver jar file too? If so, is there a way to have it read 
> from HDFS rather than copying out to the cluster manually. 
> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch <java...@gmail.com 
> <mailto:java...@gmail.com>> wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would 
> work.  But querying is performed by the workers: so you need to add to the 
> classpath of all nodes for reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com <>>:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf 
> on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
> restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table 
> and could see it in my SHOW TABLES list. But, when I try to query the table, 
> I get the same error. It looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <java...@gmail.com <>> wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark 
>> workers.  You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuil...@gmail.com <>>:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE 
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql:///",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql:/// (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-25 Thread Benjamin Kim
Hi Chris,

I did what you did. It works for me now! Thanks for your help.

Have a Merry Christmas!

Cheers,
Ben


> On Dec 25, 2015, at 6:41 AM, Chris Fregly <ch...@fregly.com> wrote:
> 
> Configuring JDBC drivers with Spark is a bit tricky as the JDBC driver needs 
> to be on the Java System Classpath per this 
> <http://spark.apache.org/docs/latest/sql-programming-guide.html#troubleshooting>
>  troubleshooting section in the Spark SQL programming guide.
> 
> Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/start-hive-thriftserver.sh>
>  is an example hive-thrift-server start script from my Spark-based reference 
> pipeline project.  Here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/bin/pipeline-spark-sql.sh>
>  is an example script that decorates the out-of-the-box spark-sql command to 
> use the MySQL JDBC driver.
> 
> These scripts explicitly set --jars to $SPARK_SUBMIT_JARS which is defined 
> here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L144>
>  and here 
> <https://github.com/fluxcapacitor/pipeline/blob/master/config/bash/.profile#L87>
>  and includes the path to the local MySQL JDBC driver.  This approach is 
> described here 
> <http://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management>
>  in the Spark docs that describe the advanced spark-submit options.  
> 
> Any jar specified with --jars will be passed to each worker node in the 
> cluster - specifically in the work directory for each SparkContext for 
> isolation purposes.
> 
> Cleanup of these jars on the worker nodes is handled by YARN automatically, 
> and by Spark Standalone per the spark.worker.cleanup.appDataTtl config param.
> 
> The Spark SQL programming guide says to use SPARK_CLASSPATH for this purpose, 
> but I couldn't get this to work for whatever reason, so i'm sticking to the 
> --jars approach used in my examples.
> 
> On Tue, Dec 22, 2015 at 9:51 PM, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Stephen,
> 
> Let me confirm. I just need to propagate these settings I put in 
> spark-defaults.conf to all the worker nodes? Do I need to do the same with 
> the PostgreSQL driver jar file too? If so, is there a way to have it read 
> from HDFS rather than copying out to the cluster manually. 
> 
> Thanks for your help,
> Ben
> 
> 
> On Tuesday, December 22, 2015, Stephen Boesch <java...@gmail.com 
> <mailto:java...@gmail.com>> wrote:
> HI Benjamin,  yes by adding to the thrift server then the create table would 
> work.  But querying is performed by the workers: so you need to add to the 
> classpath of all nodes for reads to work.
> 
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com <>>:
> Hi Stephen,
> 
> I forgot to mention that I added these lines below to the spark-default.conf 
> on the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
> restarted it.
> 
> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
> 
> I read in another thread that this would work. I was able to create the table 
> and could see it in my SHOW TABLES list. But, when I try to query the table, 
> I get the same error. It looks like I’m getting close.
> 
> Are there any other things that I have to do that you can think of?
> 
> Thanks,
> Ben
> 
> 
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <java...@gmail.com <>> wrote:
>> 
>> The postgres jdbc driver needs to be added to the  classpath of your spark 
>> workers.  You can do a search for how to do that (multiple ways).
>> 
>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuil...@gmail.com <>>:
>> I see in the Spark SQL documentation that a temporary table can be created
>> directly onto a remote PostgreSQL table.
>> 
>> CREATE TEMPORARY TABLE 
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>> url "jdbc:postgresql:///",
>> dbtable "impressions"
>> );
>> When I run this against our PostgreSQL server, I get the following error.
>> 
>> Error: java.sql.SQLException: No suitable driver found for
>> jdbc:postgresql:/// (state=,code=0)
>> 
>> Can someone help me understand why this is?
>> 
>> Thanks, Ben
>> 
>> 
>> 
>> --
>> View this message in context: 
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>  
>> <http://apache-spark-user-list.1001560.n3.nab

Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Hi Stephen,

I forgot to mention that I added these lines below to the spark-default.conf on 
the node with Spark SQL Thrift JDBC/ODBC Server running on it. Then, I 
restarted it.

spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar

I read in another thread that this would work. I was able to create the table 
and could see it in my SHOW TABLES list. But, when I try to query the table, I 
get the same error. It looks like I’m getting close.

Are there any other things that I have to do that you can think of?

Thanks,
Ben


> On Dec 22, 2015, at 6:25 PM, Stephen Boesch  wrote:
> 
> The postgres jdbc driver needs to be added to the  classpath of your spark 
> workers.  You can do a search for how to do that (multiple ways).
> 
> 2015-12-22 17:22 GMT-08:00 b2k70  >:
> I see in the Spark SQL documentation that a temporary table can be created
> directly onto a remote PostgreSQL table.
> 
> CREATE TEMPORARY TABLE 
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:postgresql:///",
> dbtable "impressions"
> );
> When I run this against our PostgreSQL server, I get the following error.
> 
> Error: java.sql.SQLException: No suitable driver found for
> jdbc:postgresql:/// (state=,code=0)
> 
> Can someone help me understand why this is?
> 
> Thanks, Ben
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>  
> 
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 



Re: Spark SQL 1.5.2 missing JDBC driver for PostgreSQL?

2015-12-22 Thread Benjamin Kim
Stephen,

Let me confirm. I just need to propagate these settings I put in
spark-defaults.conf to all the worker nodes? Do I need to do the same with
the PostgreSQL driver jar file too? If so, is there a way to have it read
from HDFS rather than copying out to the cluster manually.

Thanks for your help,
Ben

On Tuesday, December 22, 2015, Stephen Boesch <java...@gmail.com> wrote:

> HI Benjamin,  yes by adding to the thrift server then the create table
> would work.  But querying is performed by the workers: so you need to add
> to the classpath of all nodes for reads to work.
>
> 2015-12-22 18:35 GMT-08:00 Benjamin Kim <bbuil...@gmail.com
> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>>:
>
>> Hi Stephen,
>>
>> I forgot to mention that I added these lines below to the
>> spark-default.conf on the node with Spark SQL Thrift JDBC/ODBC Server
>> running on it. Then, I restarted it.
>>
>> spark.driver.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>
>> spark.executor.extraClassPath=/usr/share/java/postgresql-9.3-1104.jdbc41.jar
>>
>> I read in another thread that this would work. I was able to create the
>> table and could see it in my SHOW TABLES list. But, when I try to query the
>> table, I get the same error. It looks like I’m getting close.
>>
>> Are there any other things that I have to do that you can think of?
>>
>> Thanks,
>> Ben
>>
>>
>> On Dec 22, 2015, at 6:25 PM, Stephen Boesch <java...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','java...@gmail.com');>> wrote:
>>
>> The postgres jdbc driver needs to be added to the  classpath of your
>> spark workers.  You can do a search for how to do that (multiple ways).
>>
>> 2015-12-22 17:22 GMT-08:00 b2k70 <bbuil...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','bbuil...@gmail.com');>>:
>>
>>> I see in the Spark SQL documentation that a temporary table can be
>>> created
>>> directly onto a remote PostgreSQL table.
>>>
>>> CREATE TEMPORARY TABLE 
>>> USING org.apache.spark.sql.jdbc
>>> OPTIONS (
>>> url "jdbc:postgresql:///",
>>> dbtable "impressions"
>>> );
>>> When I run this against our PostgreSQL server, I get the following error.
>>>
>>> Error: java.sql.SQLException: No suitable driver found for
>>> jdbc:postgresql:///
>>> (state=,code=0)
>>>
>>> Can someone help me understand why this is?
>>>
>>> Thanks, Ben
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-1-5-2-missing-JDBC-driver-for-PostgreSQL-tp25773.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com
>>> <http://nabble.com>.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','user-unsubscr...@spark.apache.org');>
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','user-h...@spark.apache.org');>
>>>
>>>
>>
>>
>


[jira] [Resolved] (RANGER-345) enable-agent.sh isn't a file

2015-04-05 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/RANGER-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim resolved RANGER-345.
-
Resolution: Not A Problem

I tried building Ranger at different network, and it works.

It was network issue so i'm closing this one down.

 enable-agent.sh isn't a file
 

 Key: RANGER-345
 URL: https://issues.apache.org/jira/browse/RANGER-345
 Project: Ranger
  Issue Type: Bug
  Components: admin
Affects Versions: 0.4.0
 Environment: centos 6.6 
 jdk1.7.0_71
 maven 3.3.1
Reporter: Benjamin Kim

 I downloaded tagged version of Ranger 0.4 from github.
 I ran this command
 mvn -DskipTests clean compile package install assembly:assembly -e
 I get this result when building Security Admin Web Application module
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly 
 (default-cli) on project security-admin-web: Failed to create assembly: Error 
 adding file to archive: 
 /root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
  isn't a file. - [Help 1]
 org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
 goal org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly 
 (default-cli) on project security-admin-web: Failed to create assembly: Error 
 adding file to archive: 
 /root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
  isn't a file.
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
   at 
 org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
   at 
 org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
   at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
 Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
 assembly: Error adding file to archive: 
 /root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
  isn't a file.
   at 
 org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:429)
   at 
 org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
   ... 20 more
 Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
 Error adding file to archive: 
 /root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
  isn't a file.
   at 
 org.apache.maven.plugin.assembly.archive.phase.FileItemAssemblyPhase.execute(FileItemAssemblyPhase.java:126)
   at 
 org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:190)
   at 
 org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:378)
   ... 22 more
 Caused by: org.codehaus.plexus.archiver.ArchiverException: 
 /root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
  isn't a file.
   at 
 org.codehaus.plexus.archiver.AbstractArchiver.addFile(AbstractArchiver.java:348)
   at 
 org.apache.maven.plugin.assembly.archive.archiver.AssemblyProxyArchiver.addFile(AssemblyProxyArchiver.java:448

[jira] [Created] (RANGER-345) enable-agent.sh isn't a file

2015-03-27 Thread Benjamin Kim (JIRA)
Benjamin Kim created RANGER-345:
---

 Summary: enable-agent.sh isn't a file
 Key: RANGER-345
 URL: https://issues.apache.org/jira/browse/RANGER-345
 Project: Ranger
  Issue Type: Bug
  Components: admin
Affects Versions: 0.4.0
 Environment: centos 6.6 
jdk1.7.0_71
maven 3.3.1
Reporter: Benjamin Kim


I downloaded tagged version of Ranger 0.4 from github.

I ran this command
mvn -DskipTests clean compile package install assembly:assembly -e

I get this result when building Security Admin Web Application module

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly 
(default-cli) on project security-admin-web: Failed to create assembly: Error 
adding file to archive: 
/root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
 isn't a file. - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly 
(default-cli) on project security-admin-web: Failed to create assembly: Error 
adding file to archive: 
/root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
 isn't a file.
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:307)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:193)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:106)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:862)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:286)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:197)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
assembly: Error adding file to archive: 
/root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
 isn't a file.
at 
org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:429)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
... 20 more
Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
Error adding file to archive: 
/root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
 isn't a file.
at 
org.apache.maven.plugin.assembly.archive.phase.FileItemAssemblyPhase.execute(FileItemAssemblyPhase.java:126)
at 
org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:190)
at 
org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:378)
... 22 more
Caused by: org.codehaus.plexus.archiver.ArchiverException: 
/root/incubator-ranger-ranger-0.4/security-admin/agents-common/scripts/enable-agent.sh
 isn't a file.
at 
org.codehaus.plexus.archiver.AbstractArchiver.addFile(AbstractArchiver.java:348)
at 
org.apache.maven.plugin.assembly.archive.archiver.AssemblyProxyArchiver.addFile(AssemblyProxyArchiver.java:448)
at 
org.apache.maven.plugin.assembly.archive.phase.FileItemAssemblyPhase.execute(FileItemAssemblyPhase.java:122)
... 24 more
[ERROR] 
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read

[jira] [Updated] (MAPREDUCE-4718) MapReduce fails If I pass a parameter as a S3 folder

2014-04-21 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated MAPREDUCE-4718:


Target Version/s: 0.23.3, 1.0.3  (was: 1.0.3, 0.23.3, 2.0.0-alpha, 
2.0.1-alpha)

 MapReduce fails If I pass a parameter as a S3 folder
 

 Key: MAPREDUCE-4718
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 1.0.0, 1.0.3
 Environment: Hadoop with default configurations
Reporter: Benjamin Kim

 I'm running a wordcount MR as follows
 hadoop jar WordCount.jar wordcount.WordCountDriver 
 s3n://bucket/wordcount/input s3n://bucket/wordcount/output
  
 s3n://bucket/wordcount/input is a s3 object that contains other input files.
 However I get following NPE error
 12/10/02 18:56:23 INFO mapred.JobClient:  map 0% reduce 0%
 12/10/02 18:56:54 INFO mapred.JobClient:  map 50% reduce 0%
 12/10/02 18:56:56 INFO mapred.JobClient: Task Id : 
 attempt_201210021853_0001_m_01_0, Status : FAILED
 java.lang.NullPointerException
 at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
 at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
 at java.io.FilterInputStream.close(FilterInputStream.java:155)
 at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
 at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 MR runs fine if i specify more specific input path such as 
 s3n://bucket/wordcount/input/file.txt
 MR fails if I pass s3 folder as a parameter
 In summary,
 This works
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
 This doesn't work
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
 (both input path are directories)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4718) MapReduce fails If I pass a parameter as a S3 folder

2014-04-20 Thread Benjamin Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13975383#comment-13975383
 ] 

Benjamin Kim commented on MAPREDUCE-4718:
-

Hi Chen
I tested it with CDH4.5.0(hadoop-2.0.0+1518) and doesn't seem to have same 
problem. I'am able to successfully run a wordcount MRv1 job with s3n protocol.
So is it pretty safe to say this issue is fixed on other 2.x.x versions?

 MapReduce fails If I pass a parameter as a S3 folder
 

 Key: MAPREDUCE-4718
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 1.0.0, 1.0.3
 Environment: Hadoop with default configurations
Reporter: Benjamin Kim

 I'm running a wordcount MR as follows
 hadoop jar WordCount.jar wordcount.WordCountDriver 
 s3n://bucket/wordcount/input s3n://bucket/wordcount/output
  
 s3n://bucket/wordcount/input is a s3 object that contains other input files.
 However I get following NPE error
 12/10/02 18:56:23 INFO mapred.JobClient:  map 0% reduce 0%
 12/10/02 18:56:54 INFO mapred.JobClient:  map 50% reduce 0%
 12/10/02 18:56:56 INFO mapred.JobClient: Task Id : 
 attempt_201210021853_0001_m_01_0, Status : FAILED
 java.lang.NullPointerException
 at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
 at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
 at java.io.FilterInputStream.close(FilterInputStream.java:155)
 at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
 at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
 at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)
 MR runs fine if i specify more specific input path such as 
 s3n://bucket/wordcount/input/file.txt
 MR fails if I pass s3 folder as a parameter
 In summary,
 This works
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 /user/hadoop/wordcount/input/ s3n://bucket/wordcount/output/
 This doesn't work
  hadoop jar ./hadoop-examples-1.0.3.jar wordcount 
 s3n://bucket/wordcount/input/ s3n://bucket/wordcount/output/
 (both input path are directories)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HIVE-6230) Hive UDAF with subquery runs all logic on reducers

2014-01-19 Thread Benjamin Kim (JIRA)
Benjamin Kim created HIVE-6230:
--

 Summary: Hive UDAF with subquery runs all logic on reducers
 Key: HIVE-6230
 URL: https://issues.apache.org/jira/browse/HIVE-6230
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.10.0
Reporter: Benjamin Kim


When I have a subquery in my custom built UDAF, all the iterate, 
terminatePartial, merge, terminate runs on reducers only, where iterate and 
terminatePartial should run on mappers.

Now I don't know if this is due to design purpose, but this behavior leads to 
very long execution time on reducers and create large temporary files from them.

This happened to me with SimpleUDAF. I haven't tested it with GenericUDAF.

Here is an example
SELECT MyUDAF(col1) FROM(
  SELECT * FROM test)
GROUP BY col2




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: [b2g] ZTE Open constantly restarts.

2014-01-17 Thread benjamin . kim . nguyen
On Friday, November 22, 2013 4:41:21 PM UTC+1, Ernesto Acosta wrote:
 After installing the ROM of ZTE  offers for Spain TME, my phone started to 
 run weird. With version 1.1 of FirefoxOS, the touch did not work well at all, 
 even at times had to lock / unlock the screen using the power key to make it 
 work.
 
 
 
 I tried to return to the previous version, but I could not. I tried other 
 ROMs but I could not. Then I downloaded the zip that is in this page:
 
 
 
 http://firefox.ztems.com/
 
 
 
 I put the update.zip on the MicroSD, upgraded and everything seemed fine, but 
 when it comes to the part of the animation of Fox, the phone restarts.
 
 
 
 Just before, for a split second, I have access to ADB, but as I rebooted I 
 could not install ClockworkMod. The recovery of ZTE not let me install any 
 ROM that I have on the MicroSD.
 
 
 
 This is quite frustrating. I think my phone will only serve me as a 
 paperweight.
 
 
 
 Could someone tell me what I can do?
 
 
 
 Thanks

I have the same problem now as Ernesto Acosta
I flashed image from that site : http://firefox.ztems.com/
And bye bye my ZTE :(
___
dev-b2g mailing list
dev-b2g@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-b2g


Re: [b2g] ZTE Open-- bricked when updating OS

2014-01-17 Thread benjamin . kim . nguyen
On Saturday, January 11, 2014 3:02:22 PM UTC+1, lgg2...@gmail.com wrote:
 Hello,
 
 
 
 Finally I have a working phone. All thanks to [paziusss] 
 (http://forum.xda-developers.com/showpost.php?p=49329491postcount=40) or 
 [pazos] (in spanish 
 http://gnulinuxvagos.es/topic/2187-he-brickeado-mi-zte-open-%C2%BFy-ahora-qu%C3%A9/#entry14579).
 
 
 
 A short version is for all of us that had android 3e recovery:
 
 1.- download the indicated file (see the links) and copy to the SD
 
 2.- update the downloaded file with android 3e and goes OK
 
 3.- reboot and IT WORKS.
 
 4.- Try several times until get the root with the initial rooting method 
 (aka. for version 1.0)
 
 5.- Place the CWM recovery (see instrucctions)
 
 
 
 Then ALL is working OK.
 
 
 
 Greetings to [paziusss]/[pazos] for this work.

Yeahhh for me too :)
Thank you !!
___
dev-b2g mailing list
dev-b2g@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-b2g


Re: [b2g] ZTE Open-- bricked when updating OS

2014-01-17 Thread benjamin . kim . nguyen
Thank you for the answer :) I finally made it !
___
dev-b2g mailing list
dev-b2g@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-b2g


RE: Writing to HBase

2013-12-12 Thread Benjamin Kim
Hi Philip,
I got this bit of code to work in the spark-shell using scala against our dev 
hbase cluster.
-bash-4.1$export 
SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/cloudera/parcels/CDH/lib/hbase/hbase.jar:/opt/cloudera/parcels/CDH/lib/hbase/conf:/opt/cloudera/parcels/CDH/lib/hadoop/conf
-bash-4.1$./spark-shellscalaimport 
org.apache.hadoop.hbase.HBaseConfigurationscalaimport 
org.apache.hadoop.hbase.client._scalaimport 
org.apache.hadoop.hbase.util.Bytesscalaval conf = 
HBaseConfiguration.create()scalaval table = new HTable(conf, 
my_items)scalaval p = new 
Put(Bytes.toBytes(strawberry-fruit))scalap.add(Bytes.toBytes(item),Bytes.toBytes(item),Bytes.toBytes(strawberry))scalap.add(Bytes.toBytes(item),Bytes.toBytes(category),Bytes.toBytes(fruit))scalap.add(Bytes.toBytes(item),Bytes.toBytes(price),Bytes.toBytes(0.35))scalatable.put(p)
It put the new row strawberry-fruit into an hbase table.
Sorry, but I have another newbie question. How do I add those CLASSPATH 
dependencies when I want to compile a streaming jar in sbt so that the hbase 
configs are automatically used?
Thanks,Ben
Date: Thu, 5 Dec 2013 10:24:02 -0700
From: philip.og...@oracle.com
To: user@spark.incubator.apache.org
Subject: Re: Writing to HBase


  

  
  
Here's a good place to start:




http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201311.mbox/%3ccacyzca3askwd-tujhqi1805bn7sctguaoruhd5xtxcsul1a...@mail.gmail.com%3E





On 12/5/2013 10:18 AM, Benjamin Kim
  wrote:



  
  Does anyone have an example or some sort of
starting point code when writing from Spark Streaming into
HBase?



We currently stream ad server event log data using Flume-NG
  to tail log entries, collect them, and put them directly into
  a HBase table. We would like to do the same with Spark
  Streaming. But, we would like to do the data massaging and
  simple data analysis before. This will cut down the steps in
  prepping data and the number of tables for our data scientists
  and real-time feedback systems.



Thanks,
Ben
  


  

RE: write data into HBase via spark

2013-12-06 Thread Benjamin Kim
Hi Phillip/Hao,
I was wondering if there is a simple working example out there that I can just 
run and see it work. Then, I can customize it for our needs. Unfortunately, 
this explanation still confuses me a little.
Here is a little about the environment we are working with. We have Cloudera's 
CDH 4.4.0 installed, and it comes with HBase 0.94.6. We get data streamed in 
using Flume-NG 1.4.0. All of this is managed using Cloudera Manager 4.7.2 to 
setup and configure these services.
If you need any more information or are able to help, I would be glad to 
accommodate.
Thanks,Ben 

Date: Fri, 6 Dec 2013 18:07:08 -0700
From: philip.og...@oracle.com
To: user@spark.incubator.apache.org
Subject: Re: write data into HBase via spark


  

  
  
Hao,



Thank you for the detailed response!  (even if delayed!)  



I'm curious to know what version of hbase you added to your pom
file.



Thanks,

Philip



On 11/14/2013 10:38 AM, Hao REN wrote:



  Hi, Philip.



Basically, we need PairRDDFunctions.saveAsHadoopDataset
  to do the job, as HBase is not a fs, saveAsHadoopFile doesn't
  work.



def saveAsHadoopDataset(conf: JobConf): Unit



this function takes a JobConf parameter which should be
  configured. Essentially, you need to set output format and the
  name of the output table.



// step 1: JobConf setup:



// Note: mapred package is used, instead of the mapreduce
  package which contains new hadoop APIs.
import org.apache.hadoop.hbase.mapred.TableOutputFormat

  
import org.apache.hadoop.hbase.client._


// ... some other settings




  val conf = HBaseConfiguration.create()
  

  
  // general hbase setting

  
  conf.set(hbase.rootdir, hdfs:// + nameNodeURL +
  : + hdfsPort + /hbase)
  conf.setBoolean(hbase.cluster.distributed, true)
  conf.set(hbase.zookeeper.quorum, hostname)
  conf.setInt(hbase.client.scanner.caching, 1)

// ... some other settings




  val jobConfig: JobConf = new JobConf(conf,
  this.getClass)
  

  
  // Note:  TableOutputFormat is used as deprecated code,
because JobConf is an old hadoop API
  jobConfig.setOutputFormat(classOf[TableOutputFormat])
  jobConfig.set(TableOutputFormat.OUTPUT_TABLE,
  outputTable)







// step 2: give your mapping:




  
// the last thing todo is mapping your local data schema to
  the hbase one
// Say, our hbase schema is as below:
// rowcf:col_1cf:col_2





// And in spark, you have a RDD of triple, like (1, 2, 3),
  (4, 5, 6), ...


// So you should map RDD[(int, int, int)] to 
RDD[(ImmutableBytesWritable,
Put)], where Put carries the mapping.



// You can define a function used by RDD.map, for example:




  def convert(triple: (Int, Int, Int)) = {
val p = new Put(Bytes.toBytes(triple._1))
p.add(Bytes.toBytes(cf),
  Bytes.toBytes(col_1), Bytes.toBytes(triple._2))
p.add(Bytes.toBytes(cf),
  Bytes.toBytes(col_2), Bytes.toBytes(triple._3))
(new ImmutableBytesWritable, p)
  }




// Suppose you have a RDD[(Int, Int, Int)] called localData,
  then writing data to hbase can be done by :



new

PairRDDFunctions(localData.map(convert)).saveAsHadoopDataset(jobConfig)



Voilà. That's all you need. Hopefully, this simple example
  could help.



Hao.
  

  







  

  
  



2013/11/13 Philip Ogren philip.og...@oracle.com

  
 Hao,

  

  If you have worked out the code and turn it into an
  example that you can share, then please do!  This task is
  in my queue of things to do so any helpful details that
  you uncovered would be most appreciated.

  

  Thanks,

  Philip
  


  

  

  On 11/13/2013 5:30 AM, Hao REN wrote:

  
  
Ok, I worked it out.
  

  
  The following thread helps a lot.

Writing to HBase

2013-12-05 Thread Benjamin Kim
Does anyone have an example or some sort of starting point code when writing 
from Spark Streaming into HBase?
We currently stream ad server event log data using Flume-NG to tail log 
entries, collect them, and put them directly into a HBase table. We would like 
to do the same with Spark Streaming. But, we would like to do the data 
massaging and simple data analysis before. This will cut down the steps in 
prepping data and the number of tables for our data scientists and real-time 
feedback systems.
Thanks,Ben

Re: Decommissioning Nodes in Production Cluster.

2013-02-12 Thread Benjamin Kim
Hi,

I would like to add another scenario. What are the steps for removing a 
dead node when the server had a hard failure that is unrecoverable.

Thanks,
Ben

On Tuesday, February 12, 2013 7:30:57 AM UTC-8, sudhakara st wrote:

 The decommissioning process is controlled by an exclude file, which for 
 HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by 
 the*mapred.hosts.exclude
 * property. In most cases, there is one shared file,referred to as the 
 exclude file.This  exclude file name should be specified as a configuration 
 parameter *dfs.hosts.exclude *in the name node start up.


 To remove nodes from the cluster:
 1. Add the network addresses of the nodes to be decommissioned to the 
 exclude file.

 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes 
 being
 decommissioned.
 3. Update the namenode with the new set of permitted datanodes, with this
 command:
 % hadoop dfsadmin -refreshNodes
 4. Go to the web UI and check whether the admin state has changed to 
 “Decommission
 In Progress” for the datanodes being decommissioned. They will start 
 copying
 their blocks to other datanodes in the cluster.

 5. When all the datanodes report their state as “Decommissioned,” then all 
 the blocks
 have been replicated. Shut down the decommissioned nodes.
 6. Remove the nodes from the include file, and run:
 % hadoop dfsadmin -refreshNodes
 7. Remove the nodes from the slaves file.

  Decommission data nodes in small percentage(less than 2%) at time don't 
 cause any effect on cluster. But it better to pause MR-Jobs before you 
 triggering Decommission to ensure  no task running in decommissioning 
 subjected nodes.
  If very small percentage of task running in the decommissioning node it 
 can submit to other task tracker, but percentage queued jobs  larger then 
 threshold  then there is chance of job failure. Once triggering the 'hadoop 
 dfsadmin -refreshNodes' command and decommission started, you can resume 
 the MR jobs.

 *Source : The Definitive Guide [Tom White]*



 On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan 
 wrote:

 Hi Guys,

 It's recommenced do with removing one the datanode in production cluster.
 via Decommission the particular datanode. please guide me.
  
 -Dhanasekaran,

 Did I learn something today? If not, I wasted it.
  


[jira] [Updated] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-11-13 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6470:


Attachment: SingleColumnValueFilter_HBASE_6470-trunk.patch

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
  Labels: patch
 Fix For: 0.96.0

 Attachments: SingleColumnValueFilter_HBASE_6470-trunk.patch


 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-11-13 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6470:


Status: Open  (was: Patch Available)

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
  Labels: patch
 Fix For: 0.96.0

 Attachments: SingleColumnValueFilter_HBASE_6470-trunk.patch


 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-11-13 Thread Benjamin Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496036#comment-13496036
 ] 

Benjamin Kim commented on HBASE-6470:
-

oops I just did

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
  Labels: patch
 Fix For: 0.96.0

 Attachments: SingleColumnValueFilter_HBASE_6470-trunk.patch


 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-11-12 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6470:


Fix Version/s: 0.96.0
 Assignee: Benjamin Kim
 Release Note: Changes private fields of SingleColumnValueFilter to 
protected for more subtle filtering of column values
   Status: Patch Available  (was: Open)

Changes private fields of SingleColumnValueFilter to protected for more subtle 
filtering of column values

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
  Labels: patch
 Fix For: 0.96.0


 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-11-12 Thread Benjamin Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13495291#comment-13495291
 ] 

Benjamin Kim commented on HBASE-6470:
-

submitted a patch. just changed all private fields to protected

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
Assignee: Benjamin Kim
  Labels: patch
 Fix For: 0.96.0


 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-10-29 Thread Benjamin Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486073#comment-13486073
 ] 

Benjamin Kim commented on HBASE-6470:
-

I'll come back to this first thing in tomorrow, and create a patch

 SingleColumnValueFilter with private fields and methods
 ---

 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: Filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
  Labels: patch

 Why are most fields and methods declared private in SingleColumnValueFilter?
 I'm trying to extend the functions of the SingleColumnValueFilter to support 
 complex column types such as JSON, Array, CSV, etc.
 But inheriting the SingleColumnValueFilter doesn't give any benefits for I 
 have to rewrite the codes. 
 I think all private fields and methods could turn into protected mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (MAPREDUCE-4718) MapReduce fails If I pass a parameter as a S3 folder

2012-10-10 Thread Benjamin Kim (JIRA)
Benjamin Kim created MAPREDUCE-4718:
---

 Summary: MapReduce fails If I pass a parameter as a S3 folder
 Key: MAPREDUCE-4718
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4718
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 1.0.3, 1.0.0
 Environment: Hadoop with default configurations
Reporter: Benjamin Kim


I'm running a wordcount MR as follows

hadoop jar WordCount.jar wordcount.WordCountDriver s3n://bucket/wordcount/input 
s3n://bucket/wordcount/output
 
s3n://bucket/wordcount/input is a s3 object that contains other input files.

However I get following NPE error

12/10/02 18:56:23 INFO mapred.JobClient:  map 0% reduce 0%
12/10/02 18:56:54 INFO mapred.JobClient:  map 50% reduce 0%
12/10/02 18:56:56 INFO mapred.JobClient: Task Id : 
attempt_201210021853_0001_m_01_0, Status : FAILED
java.lang.NullPointerException
at 
org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:106)
at java.io.BufferedInputStream.close(BufferedInputStream.java:451)
at java.io.FilterInputStream.close(FilterInputStream.java:155)
at org.apache.hadoop.util.LineReader.close(LineReader.java:83)
at 
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.close(LineRecordReader.java:144)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.close(MapTask.java:497)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:765)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

MR runs fine if i specify more specific input path such as 
s3n://bucket/wordcount/input/file.txt

MR fails if I pass s3 folder as a parameter


In summary,
This works
 hadoop jar ./hadoop-examples-1.0.3.jar wordcount /user/hadoop/wordcount/input/ 
s3n://bucket/wordcount/output/

This doesn't work
 hadoop jar ./hadoop-examples-1.0.3.jar wordcount s3n://bucket/wordcount/input/ 
s3n://bucket/wordcount/output/

(both input path are directories)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6470) SingleColumnValueFilter with private fields and methods

2012-07-28 Thread Benjamin Kim (JIRA)
Benjamin Kim created HBASE-6470:
---

 Summary: SingleColumnValueFilter with private fields and methods
 Key: HBASE-6470
 URL: https://issues.apache.org/jira/browse/HBASE-6470
 Project: HBase
  Issue Type: Improvement
  Components: filters
Affects Versions: 0.94.0
Reporter: Benjamin Kim
 Fix For: 0.94.0


Why are most fields and methods declared private in SingleColumnValueFilter?

I'm trying to extend the functions of the SingleColumnValueFilter to support 
complex column types such as JSON, Array, CSV, etc.

But inheriting the SingleColumnValueFilter doesn't give any benefits for I have 
to rewrite the codes. 

I think all private fields and methods could turn into protected mode.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-07-13 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6288:


Attachment: HBASE-6288-trunk.patch
HBASE-6288-94.patch
HBASE-6288-92-1.patch
HBASE-6288-92.patch

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim
 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, 
 HBASE-6288-94.patch, HBASE-6288-trunk.patch


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-07-13 Thread Benjamin Kim (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413921#comment-13413921
 ] 

Benjamin Kim commented on HBASE-6288:
-

It took a while for being gone for a vacation. Here goes the patches =)

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim
 Attachments: HBASE-6288-92-1.patch, HBASE-6288-92.patch, 
 HBASE-6288-94.patch, HBASE-6288-trunk.patch


 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-06-27 Thread Benjamin Kim (JIRA)
Benjamin Kim created HBASE-6288:
---

 Summary: In hbase-daemons.sh, description of the default 
backup-master file path is wrong
 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.94.0, 0.92.1, 0.92.0
Reporter: Benjamin Kim


In hbase-daemons.sh, description of the default backup-master file path is wrong

{code}
#   HBASE_BACKUP_MASTERS File naming remote hosts.
# Default is $\{HADOOP_CONF_DIR\}/backup-masters
{code}

it says the default backup-masters file path is at a hadoop-conf-dir, but 
shouldn't this be HBASE_CONF_DIR?

also adding following lines to conf/hbase-env.sh would be helpful
{code}
# File naming hosts on which backup HMaster will run.  
$HBASE_HOME/conf/backup-masters by default.
export HBASE_BACKUP_MASTERS=$\{HBASE_HOME\}/conf/backup-masters
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6288) In hbase-daemons.sh, description of the default backup-master file path is wrong

2012-06-27 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6288:


Description: 
In hbase-daemons.sh, description of the default backup-master file path is wrong

{code}
#   HBASE_BACKUP_MASTERS File naming remote hosts.
# Default is ${HADOOP_CONF_DIR}/backup-masters
{code}

it says the default backup-masters file path is at a hadoop-conf-dir, but 
shouldn't this be HBASE_CONF_DIR?

also adding following lines to conf/hbase-env.sh would be helpful
{code}
# File naming hosts on which backup HMaster will run.  
$HBASE_HOME/conf/backup-masters by default.
export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
{code}



  was:
In hbase-daemons.sh, description of the default backup-master file path is wrong

{code}
#   HBASE_BACKUP_MASTERS File naming remote hosts.
# Default is $\{HADOOP_CONF_DIR\}/backup-masters
{code}

it says the default backup-masters file path is at a hadoop-conf-dir, but 
shouldn't this be HBASE_CONF_DIR?

also adding following lines to conf/hbase-env.sh would be helpful
{code}
# File naming hosts on which backup HMaster will run.  
$HBASE_HOME/conf/backup-masters by default.
export HBASE_BACKUP_MASTERS=$\{HBASE_HOME\}/conf/backup-masters
{code}




 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 

 Key: HBASE-6288
 URL: https://issues.apache.org/jira/browse/HBASE-6288
 Project: HBase
  Issue Type: Task
  Components: master, scripts, shell
Affects Versions: 0.92.0, 0.92.1, 0.94.0
Reporter: Benjamin Kim

 In hbase-daemons.sh, description of the default backup-master file path is 
 wrong
 {code}
 #   HBASE_BACKUP_MASTERS File naming remote hosts.
 # Default is ${HADOOP_CONF_DIR}/backup-masters
 {code}
 it says the default backup-masters file path is at a hadoop-conf-dir, but 
 shouldn't this be HBASE_CONF_DIR?
 also adding following lines to conf/hbase-env.sh would be helpful
 {code}
 # File naming hosts on which backup HMaster will run.  
 $HBASE_HOME/conf/backup-masters by default.
 export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6132) ColumnCountGetFilter PageFilter not working with FilterList

2012-05-30 Thread Benjamin Kim (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Kim updated HBASE-6132:


Description: 
Thanks to Anoop and Ramkrishna, here's what we found with FilterList

If I use FilterList to include ColumnCountGetFilter among other filters, the 
returning Result has no keyvalues.

This problem seems to occur when specified column count is less then actual 
number of existing columns.

Also same problem arises with PageFilter

Following is the code of the problem:

{code}
Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Get get = new Get(Bytes.toBytes(test1));
FilterList filterList = new FilterList();
filterList.addFilter(new ColumnCountGetFilter(100));   
get.setFilter(filterList);
Result r = table.get(get);
System.out.println(r.size()); // prints zero
{code}

  was:
Thanks to Anoop and Ramkrishna, here's what we found with FilterList

If I use FilterList to include ColumnCountGetFilter among other filters, the 
returning Result has no keyvalues.

This problem seems to occur when specified column count is less then actual 
number of existing columns.

Also same problem arises with PageFilter

Following is the code of the problem:

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Get get = new Get(Bytes.toBytes(test1));
FilterList filterList = new FilterList();
filterList.addFilter(new ColumnCountGetFilter(100));   
get.setFilter(filterList);
Result r = table.get(get);
System.out.println(r.size()); // prints zero


 ColumnCountGetFilter  PageFilter not working with FilterList
 -

 Key: HBASE-6132
 URL: https://issues.apache.org/jira/browse/HBASE-6132
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.92.0, 0.92.1, 0.94.0
 Environment: Cent OS 5.5 distributed hbase cluster. Hadoop 1.0.0, 
 zookeeper 3.4.3
Reporter: Benjamin Kim

 Thanks to Anoop and Ramkrishna, here's what we found with FilterList
 If I use FilterList to include ColumnCountGetFilter among other filters, the 
 returning Result has no keyvalues.
 This problem seems to occur when specified column count is less then actual 
 number of existing columns.
 Also same problem arises with PageFilter
 Following is the code of the problem:
 {code}
 Configuration conf = HBaseConfiguration.create();
 HTable table = new HTable(conf, test);
 Get get = new Get(Bytes.toBytes(test1));
 FilterList filterList = new FilterList();
 filterList.addFilter(new ColumnCountGetFilter(100));   
 get.setFilter(filterList);
 Result r = table.get(get);
 System.out.println(r.size()); // prints zero
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6132) ColumnCountGetFilter PageFilter not working with FilterList

2012-05-29 Thread Benjamin Kim (JIRA)
Benjamin Kim created HBASE-6132:
---

 Summary: ColumnCountGetFilter  PageFilter not working with 
FilterList
 Key: HBASE-6132
 URL: https://issues.apache.org/jira/browse/HBASE-6132
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.94.0, 0.92.1, 0.92.0
 Environment: Cent OS 5.5 distributed hbase cluster. Hadoop 1.0.0, 
zookeeper 3.4.3
Reporter: Benjamin Kim


Thanks to Anoop and Ramkrishna, here's what we found with FilterList

If I use FilterList to include ColumnCountGetFilter among other filters, the 
returning Result has no keyvalues.

This problem seems to occur when specified column count is less then actual 
number of existing columns.

Also same problem arises with PageFilter

Following is the code of the problem:

Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, test);
Get get = new Get(Bytes.toBytes(test1));
FilterList filterList = new FilterList();
filterList.addFilter(new ColumnCountGetFilter(100));   
get.setFilter(filterList);
Result r = table.get(get);
System.out.println(r.size()); // prints zero

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




<    1   2   3