I believe it's behaving as expected. It will spawn 64 containers because
that's how much memory you have available. The vcores isn't harshly
enforced since CPUs can be elastic. This blog from cloudera explain how to
enforce CPU limits using CGroups.
You can use the SequenceFileLoader from the piggybank.
http://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
On Tue, May 20, 2014 at 2:46 AM, abhishek dodda
abhishek.dod...@gmail.comwrote:
Hi All,
I have trouble building code for this project.
)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
On Tue, May 20, 2014 at 5:41 AM, Pradeep Gollakota
pradeep...@gmail.comwrote:
You can use
Hi All,
I’m trying to work with NIO in Java 7, and I’m not able to access methods
that are declared in the super class.
(.getPath (java.nio.file.FileSystems/getDefault) /)
The above code throws the following Exception:
Exception in thread main java.lang.IllegalArgumentException: No matching
Check out
http://archive.cloudera.com/cdh/3/pig/piglatin_ref2.html#REGEX_EXTRACT
This may suit your needs
On Mon, May 12, 2014 at 12:16 AM, kartik manocha koolkarti...@gmail.comwrote:
Hi,
I am new to pig facing an issue in filtering out a string from a field,
mentioned is the scenario.
regex, so
that it prints the string before that.
Thanks,
Kartik
On Mon, May 12, 2014 at 2:03 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Check out
http://archive.cloudera.com/cdh/3/pig/piglatin_ref2.html#REGEX_EXTRACT
This may suit your needs
On Mon, May 12, 2014 at 12
Whats the LoadFunc you're using?
On Thu, Apr 24, 2014 at 9:28 AM, Swapnil Shinde swapnilushi...@gmail.comwrote:
I am facing very weird problem while multiplication.
Pig simplified code snippet-
A = LOAD 'file_A' AS (colA1 : double, colA2 : double);
describe A;
*A: {colA1:
might not be cast-able to
numeric for one or more records.
On 24 April 2014 22:24, Pradeep Gollakota pradeep...@gmail.com
wrote:
Whats the LoadFunc you're using?
On Thu, Apr 24, 2014 at 9:28 AM, Swapnil Shinde
swapnilushi...@gmail.com
wrote:
I am facing very weird
Pig is a little too smart when dealing with data. It has a feature called
split combination. If you set it to false, you should see more mappers.
SET pig.noSplitCombination true;
On Tue, Apr 22, 2014 at 12:14 PM, Patcharee Thongtra
patcharee.thong...@uni.no wrote:
Hi,
I wrote a custom
What is the storage func you're using? My guess is that there is some
shared state in the Storage func. Take a look at this SO that is dealing
with shared state in Stores.
http://stackoverflow.com/questions/20225842/apache-pig-append-one-dataset-to-another-one/20235592#20235592.
The reason why
That is because you're calling REPLACE on a bag of tuples and not a string.
What you would want to do is write a UDF (suggested name JOIN_ON), that
takes as an argument a join char and will join all the tuples in the bag by
the join char.
On Mon, Apr 7, 2014 at 12:31 PM, Krishnan Narayanan
I don't understand what you're trying to do from your example.
If you perform a cross on the data you have, the output will be the
following:
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,2,4,5,7,10,11)
(1,5,7,8,9,10,11)
(1,5,7,8,9,10,11)
Subject: Re: Any way to join two aliases without using CROSS
The output I would like to see is
(1,2,3,4,5,10,11)
(1,2,4,5,7,10,12)
(1,5,7,8,9,10,13)
On Tue, Mar 25, 2014 at 3:58 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
I don't understand what you're trying to do from your
Unfortunately, the Enumerate UDF from DataFu would not work in this case.
The UDF works on Bags and in this case, we want to enumerate a relation.
Implementing RANK is a very tricky thing to do correctly. I'm not even sure
if it's doable just by using Pig operators, UDFs or macros. Best option is
According to the docs, It should work.
http://pig.apache.org/docs/r0.12.0/basic.html#register
Stupid question, but is the path correct? Is it on HDFS or local disk?
On Tue, Mar 11, 2014 at 8:36 PM, Anthony Alleven aalle...@iastate.eduwrote:
Hello,
I am trying to use a User Defined Function
Is there a firewall thats blocking connections on port 9092? Also, the
broker list should be comma separated.
On Tue, Mar 11, 2014 at 9:02 AM, A A andthereitg...@hotmail.com wrote:
Sorry one of the brokers for was down. Brought it back up. Tried the
following
Best way to examine this is to use the EXPLAIN operator. It will show you
the physical MapReduce plan and what features are being executed in each
phase.
On Tue, Mar 11, 2014 at 11:29 AM, ey-chih Chow eyc...@gmail.com wrote:
Hi,
I got a question on a pig script that has a single input with
I forgot to mention that there are also other 3rd party libraries that make
examining the physical plan easier. For example take a look at
Lipstickhttps://github.com/Netflix/Lipstickfrom Netflix.
On Tue, Mar 11, 2014 at 11:41 AM, Pradeep Gollakota pradeep...@gmail.comwrote:
Best way to examine
I believe the Prefix filter does a full table scan. What you want to do for
fast seeks is provide a 'startKey' and 'endKey'. You can mimic what the
prefix filter does by doing startKey = prefix and endKey = prefix~ (~ is
the last printable ascii char)
On Sat, Mar 8, 2014 at 8:37 AM, Parkirat
Kundera also has support for HBase as far as I'm aware.
On Thu, Mar 6, 2014 at 8:13 PM, jeevi tesh jeevitesh...@gmail.com wrote:
Hi all,
I'm new to hbase in search of jpa kind of package for hbase to push the
data into hbase system. Started trying with stargate where i found very
strict
Where exactly are you getting duplicates? I'm not sure I understand your
question. Can you give an example please?
On Thu, Feb 27, 2014 at 11:15 AM, Anastasis Andronidis
andronat_...@hotmail.com wrote:
Hello everyone,
I have a foreach statement and inside of it, I use an order by. After the
,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
(20131209,AEGIS04-KG,ch.cern.sam.ROC_CRITICAL,0.0,SRMv2)
(20131209,AM-02-SEUA,ch.cern.sam.ROC_CRITICAL,0.0,CREAM-CE)
.
.
.
On 27 Φεβ 2014, at 10:20 μ.μ., Pradeep Gollakota pradeep...@gmail.com
wrote:
Where exactly are you getting duplicates? I'm not sure I
Hi Neha,
6. It seems like #4 can be avoided by using MapTopicPartition,
Long or MapTopicPartition, TopicPartitionOffset as the argument type.
How? lastCommittedOffsets() is independent of positions(). I'm not sure I
understood your suggestion.
I think of subscription as you're subscribing
Hi Neha,
6. It seems like #4 can be avoided by using MapTopicPartition,
Long or MapTopicPartition, TopicPartitionOffset as the argument type.
How? lastCommittedOffsets() is independent of positions(). I'm not sure I
understood your suggestion.
I think of subscription as you're subscribing
feedback on which APIs
should have different arguments/return types?
2. lastCommittedOffsets() does what you said in the javadoc.
Thanks,
Neha
On Tue, Feb 11, 2014 at 11:45 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hi Jay,
I apologize for derailing the conversation about
do you think?
-Jay
On Mon, Feb 10, 2014 at 3:37 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
WRT to hierarchical topics, I'm referring to
KAFKA-1175https://issues.apache.org/jira/browse/KAFKA-1175.
I would just like to think through the implications for the Consumer API
+1 Jun.
On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian
srsubraman...@linkedin.com wrote:
+1 on Jun's suggestion.
On 2/10/14 2:01 PM, Jun Rao jun...@gmail.com wrote:
I actually prefer to see those at INFO level. The reason is that the
config
system in an application can be complex.
uniquely identifies a partition of a topic
Thanks,
Neha
On Mon, Feb 10, 2014 at 12:36 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Couple of very quick thoughts.
1. +1 about renaming commit(...) and commitAsync(...)
2. I'd also like to extend the above for the poll() method as well
Have you read this part of the documentation?
http://kafka.apache.org/documentation.html#semantics
Just wondering if that solves your use case.
On Mon, Feb 10, 2014 at 9:11 AM, Garry Turkington
g.turking...@improvedigital.com wrote:
Hi,
I've been doing some prototyping on Kafka for a few
Couple of very quick thoughts.
1. +1 about renaming commit(...) and commitAsync(...)
2. I'd also like to extend the above for the poll() method as well. poll()
and pollWithTimeout(long, TimeUnit)?
3. Have you guys given any thought around how this API would be used with
hierarchical topics?
4.
+1 Jun.
On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian
srsubraman...@linkedin.com wrote:
+1 on Jun's suggestion.
On 2/10/14 2:01 PM, Jun Rao jun...@gmail.com wrote:
I actually prefer to see those at INFO level. The reason is that the
config
system in an application can be complex.
uniquely identifies a partition of a topic
Thanks,
Neha
On Mon, Feb 10, 2014 at 12:36 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Couple of very quick thoughts.
1. +1 about renaming commit(...) and commitAsync(...)
2. I'd also like to extend the above for the poll() method as well
I'm not sure I understand the use case for something like that. I'm pretty
sure the YARN API doesn't support it though. What you might be able to do
is to tear down your existing container and request a new one.
On Mon, Feb 10, 2014 at 10:28 AM, Thomas Bentsen t...@bentzn.com wrote:
I am no
[
https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895123#comment-13895123
]
Pradeep Gollakota commented on KAFKA-1226:
--
[~jvanremoortere] Can you either add
[
https://issues.apache.org/jira/browse/KAFKA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895138#comment-13895138
]
Pradeep Gollakota commented on KAFKA-1226:
--
[~jvanremoortere] Sweet! Thanks.
Can
Hi All,
In the blog describing the coprocessor there was sequence diagram walking
through the lifecycle of a Get.
https://blogs.apache.org/hbase/mediaresource/60b135e5-04c6-4197-b262-e7cd08de784b
I'm wondering if the lifecycle of a Put follows the same sequence.
Specifically for my use case, I'm
Thank you!
On Tue, Jan 21, 2014 at 4:52 PM, Ted Yu yuzhih...@gmail.com wrote:
bq. Does the client wait until the postPut() is executed?
Yes.
Please see HRegion#doMiniBatchMutation()
In 0.94, it is around line 2521.
Cheers
On Tue, Jan 21, 2014 at 4:32 PM, Pradeep Gollakota pradeep
It's strange that it's being executed on the Map-side. The group is a
reduce side operation (I'm assuming) and it seems that the nested foreach
would happen on Reduce-side after grouping. Have you looked at the MR plan
to verify that it is being executed Map-side?
One thing to try might be to
Hi All,
I have a use case where I need to replicate data from HBase into
Elasticsearch. I've found two implementations of an HBase River that
accomplishes this.
One uses timestamps to do a timerange scan of the table (since last sync)
and replicates data across. For many reasons this is not
Did you mean to say timeout instead of spill? Spills don't cause task
failures (unless a spill fails). Default timeout for a task is 10 min. It
would be very helpful to have a stack trace to look at, at the very least.
On Fri, Jan 10, 2014 at 7:53 AM, Zebeljan, Nebojsa
I lied in my previous email... it doesn't look like Phoenix uses HIndex.
On Sun, Dec 22, 2013 at 3:53 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
Take a look at this library from Huawei. They went a step further to
colocate the index with the primary partition. I believe Phoenix uses
Do you know if machines 19-23 are on a different rack? It seems to me that
your problem might be a networking problem. Whether it is hardware,
configuration or something else entirely, I'm not sure. It might be
worthwhile to talk to your systems administrator to see why pings to these
machines are
, Pradeep Gollakota pradeep...@gmail.com
wrote:
Do you know if machines 19-23 are on a different rack? It seems to me
that
your problem might be a networking problem. Whether it is hardware,
configuration or something else entirely, I'm not sure. It might be
worthwhile to talk to your systems
This is kinda tangential, but for very very common dependencies such as
guava, jackson, etc. would it make sense to use a shaded jar so as not to
affect user dependencies?
On Mon, Dec 16, 2013 at 7:47 PM, Ted Yu yuzhih...@gmail.com wrote:
Please try out patch v2 from HBASE-10174
Thanks
On
It seems like what you're asking for is Versioned Schema management. Pig is
not designed for that. Pig is only a scripting language to manipulate
datasets.
I'd recommend you look into Thrift, Protocol Buffers and Avro. They are
compact serialization libraries that do versioned schema management.
Did you recently upgrade to 0.96? This is a problem I faced with mismatched
clients connecting to an 0.96 cluster. Starting in that version, the root
node for zookeeper chanced from /hbase to /hbase-unsecure (if in unsecure
mode).
On Thu, Dec 12, 2013 at 10:47 PM, Sandeep L
Hi All,
I'm trying to understand how different configuration will affect
performance for my use cases. My table has the following the following
schema. I'm storing event logs in a single column family. The row key is in
the format [company][timestamp][uuid].
My access pattern is fairly simple.
[
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Pradeep Gollakota updated KAFKA-1175:
-
Issue Type: New Feature (was: Bug)
Hierarchical Topics
[
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13843861#comment-13843861
]
Pradeep Gollakota commented on KAFKA-1175:
--
I'm very interested in this feature
[
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13844002#comment-13844002
]
Pradeep Gollakota commented on KAFKA-1175:
--
In the proposal, [~jkreps] talks
It's not valid PigLatin...
The Grunt shell doesn't let you try out functions and UDFs are you're
trying to use them.
A = LOAD 'data' USING PigStorage() as (ip: chararray);
B = FOREACH A GENERATE REGEX_EXTRACT(ip, '(.*):(.*)', 1);
DUMP B;
You always have to load a dataset and work
I tried to following script (not exactly the same) and it worked correctly
for me.
businesses = LOAD 'dataset' using PigStorage(',') AS (a, b, c,
business_id: chararray, lat: double, lng: double);
locations = FOREACH businesses GENERATE business_id, lat, lng;
STORE locations INTO 'locations.tsv';
Jacob Perkins submitted a POC patch. However, my guess is that this will
not be included in the 0.13 release. There's still quite a bit of work to
be done and we'll be working on it. You can track the progress at
https://issues.apache.org/jira/browse/PIG-3453
On Mon, Dec 2, 2013 at 9:51 AM,
In addition to Impala and Pheonix, I'm going to throw PrestoDB into the
mix. :)
http://prestodb.io/
On Mon, Dec 2, 2013 at 10:58 AM, Doug Meil doug.m...@explorysmedical.comwrote:
You are going to want to figure out a rowkey (or a set of tables with
rowkeys) to restrict the number of I/O's.
defines your row key. You should lead
with the columns that you'll filter against most frequently. Then, take a
look at adding secondary indexes to speedup queries against other columns.
Thanks,
James
On Mon, Dec 2, 2013 at 11:01 AM, Pradeep Gollakota pradeep...@gmail.com
wrote
This question belongs on the user list. The dev list is meant for Pig
developers to discuss issues related to the development of Pig. I’ve
forwarded this to the user list. It also helps tremendously if you format
your data and scripts nicely as they’re much easier to read and understand.
I use a
This question belongs on the user list. The dev list is meant for Pig
developers to discuss issues related to the development of Pig. I’ve
forwarded this to the user list. It also helps tremendously if you format
your data and scripts nicely as they’re much easier to read and understand.
I use a
I don't think there's an out of the box solution for it. But it's fairly
trivial to do with a UDF
On Nov 15, 2013 3:19 PM, Jerry Lam chiling...@gmail.com wrote:
Hi Pig users,
Do you know how to add a key value pair into a map?
For instance, a relation of A contains a document:map[] for each
I'm a little curious as to how you would be able to use no_of_days as a
column qualifier at all... it changes everyday for all users right? So how
will you keep your table updated?
On Thu, Nov 14, 2013 at 9:07 AM, Jean-Marc Spaggiari
jean-m...@spaggiari.org wrote:
You can use your no_day as a
Use the ILLUSTRATE or EXPLAIN keywords to look at the details of the
physical execution plan... from first glance it doesn't look like you'd
need a 2nd job to do the joins, but if you can post the output of
ILLUSTRATE/EXPLAIN, we can look into it.
On Mon, Nov 11, 2013 at 4:36 PM, Dexin Wang
Each element in A is not a Bag. A relation is a collection of tuples (just
like a bag). So each element in A is a tuple whose first element is a Bag.
If you want to order the tuples by id, you have to extract them from the
bag first.
A = LOAD 'data' ...;
B = FOREACH A GENERATE FLATTEN($0);
C =
Really dumb question but... when running in MapReduce mode, is your input
file on HDFS?
On Tue, Nov 5, 2013 at 9:17 AM, Sameer Tilak ssti...@live.com wrote:
Dear Pig experts,
I have the following Pig script that works perfectly in local mode.
However, in the mapreduce mode I get AU as :
CROSS is grossly expensive to compute so I’m not surprised that the
performance is good enough. Are you repeating your LOAD and FILTER op’s for
every one of your small files? At the end of the day, what is it that
you’re trying to accomplish? Find the 1 row you’re after and attach to all
rows in
but originally it
stores in different environment. I pull the data from there and load into
HDFS. Anyway because of our architecture I can't change it right now.
Thanks
Best regards...
On Tue, Nov 5, 2013 at 7:43 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
CROSS is grossly expensive
[
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13813136#comment-13813136
]
Pradeep Gollakota commented on PIG-3453:
[~thedatachef] Wow... This is a great start
This is most likely because you haven't defined the outputSchema method of
the UDF. The AS keyword merges the schema generated by the UDF with the
user specified schema. If the UDF does not override the method and specify
the output schema, it is considered null and you will not be able to use AS
I think you’re misunderstanding how HBaseStorage works. HBaseStorage uses
the HBaseInputFormat underneath the hood. The number of map tasks that are
spawned is dependent on the number of regions you have. The map tasks are
spawned such that the tasks are local to the regions they’re reading from.
If I understood your question correctly, given the following input:
main_data.txt
{id: foo, some_field: 12354, score: 0}
{id: foobar, some_field: 12354, score: 0}
{id: baz, some_field: 12345, score: 0}
score_data.txt
{id: foo, score: 1}
{id: foobar, score: 20}
you want the following output
Are you able to post your UDF (or at least a sanitized version)?
On Wed, Oct 30, 2013 at 10:46 AM, Henning Kropp henning.kr...@gmail.comwrote:
Hi,
thanks for your reply. I read about the expected behavior on the front-end
and I am getting the NPE on the back-end. The Mappers log the
Great question. There seems to be some confusion about how DISTINCT
operates. I remembered (and thankfully found) this
messagehttp://mail-archives.apache.org/mod_mbox/pig-user/201309.mbox/%3CCAE7pYjar3hX4Kp%2B5SQz3sr%3DvjxfQDVq_6Yi4vh9KgfOj3dzTGw%40mail.gmail.com%3E
that
explains the behavior.
As
Not really...
In my experience, Pig is only good at dealing with tabular data. The type
of graphical data you have is not workable in Pig. Have you considered
using a Graph database (such as Neo4j)? These databases are highly
optimized for doing the type of path queries you're looking for.
On
A replicated cross (implemented as a replicated join on a synthetic key) is
probably your best bet.
On Wed, Oct 23, 2013 at 2:09 PM, Daniel Dai da...@hortonworks.com wrote:
Can you do a cross?
On Mon, Oct 21, 2013 at 2:21 PM, Serega Sheypak serega.shey...@gmail.com
wrote:
Hi, I have two
I think you want to use option 2. It preserves the data thats on those data
nodes.
P.S: The Hadoop mailing list might be a better list to post this (type) of
question on. This is not really HBase specific.
On Tue, Oct 22, 2013 at 7:53 AM, satish satishkorit...@gmail.com wrote:
Hi All,
We
Please ask this on the Impala mailing list. This is not an HBase (or Hive)
question.
On Tue, Oct 22, 2013 at 1:13 AM, Garg, Rinku rinku.g...@fisglobal.comwrote:
Hi All,
We have installed cludera hadoop-2.0.0-mr1-cdh4.2.0 with
hive-0.10.0-cdh4.2.0. Both are working as desired. We can run any
Hi All,
Thank you so much for your replies!
For my particular use case (tail -f multiple files and write the entries
into a db), I'm using pmap to process each file in a separate thread and
for each file, I'm using doseq to write to db. It seems to be working well
(though I still need to
This question does not belong in the Pig mailing list. Please ask on the
elephant bird mailing list at
https://groups.google.com/forum/?fromgroups#!forum/elephantbird-dev
On Thu, Oct 17, 2013 at 4:02 PM, Zhu Wayne zhuw.chic...@gmail.com wrote:
Why build? Get from maven repo.
Hi All,
I’m (very) new to clojure (and loving it)… and I’m trying to wrap my head
around how to correctly choose doseq vs dorun for my particular use case.
I’ve read this earlier post
https://groups.google.com/forum/#!msg/clojure/8ebJsllH8UY/mXtixH3CRRsJ and
I had a clarifying question.
Don't fix it if it ain't broken =P
There shouldn't be any reason why you couldn't change it (back) to the
standard way that cloudera distributions are set up. Off the top of my
head, I can't think of anything that you're missing. But at the same time,
if your cluster is working as is, why change
I'm not aware of anyway to do that. I think you're also missing the spirit
of Pig. Pig is meant to be a data workflow language. Describe a workflow
for your data using PigLatin and Pig will then compile your script to
MapReduce jobs. The number of MapReduce jobs that it generates is the
smallest
, 2013 at 10:16 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
I'm not aware of anyway to do that. I think you're also missing the
spirit
of Pig. Pig is meant to be a data workflow language. Describe a workflow
for your data using PigLatin and Pig will then compile your script to
MapReduce
not
successfully registered with RM
On Fri, Oct 11, 2013 at 3:53 PM, Pradeep Gollakota
pradeep...@gmail.comwrote:
All,
I have a Yarn application that is launching a single container. The
container completes successfully but the application fails because the node
manager is killing my
There are plenty of log aggregation tools both open source and commercial
off the shelf. Here's some
http://devopsangle.com/2012/04/19/8-splunk-alternatives/
My personal recommendation is LogStash.
On Thu, Oct 10, 2013 at 10:38 PM, Raymond Tay raymondtay1...@gmail.comwrote:
You can try Chukwa
Actually... I believe that is expected behavior. Since your CPU is pegged
at 100% you're not going to be IO bound. Typically jobs tend to be CPU
bound or IO bound. If you're CPU bound you expect to see low IO throughput.
If you're IO bound, you expect to see low CPU usage.
On Thu, Oct 10, 2013
better disk throughput and, (2) CPU load is almost evenly
spread across all cores/threads (no CPU gets pegged to 100%).
On Thu, Oct 10, 2013 at 11:15 AM, Pradeep Gollakota
pradeep...@gmail.comwrote:
Actually... I believe that is expected behavior. Since your CPU is pegged
at 100% you're
Since hadoop 3.0 is 2 major versions higher, it will be significantly
different than working with hadoop 1.1.2. The hadoop-1.1 branch is
available on SVN at
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.1/
On Tue, Oct 1, 2013 at 11:30 PM, Karim Awara
I myself am in favor of the two branch approach. It won't block the 0.12
release and it is easier to maintain.
On Mon, Sep 30, 2013 at 12:56 PM, Jeremy Karn jk...@mortardata.com wrote:
Ok, sounds good. I'll take a shot at it tonight.
On Mon, Sep 30, 2013 at 3:48 PM, Daniel Dai
I believe it's a difference between the version that your code was compiled
against vs the version that you're running against. Make sure that you're
not packaging hadoop jar's into your jar and make sure you're compiling
against the correct version as well.
On Sun, Sep 29, 2013 at 7:27 PM, lei
, lei liu liulei...@gmail.com wrote:
Yes, My job is compiled in CHD3u3, and I run the job on CDH4.3.1, but I
use the mr1 of CHD4.3.1 to run the job.
What are the different mr1 of cdh4 and mr of cdh3?
Thanks,
LiuLei
2013/9/30 Pradeep Gollakota pradeep...@gmail.com
I believe it's
Improper capitalization. Storage functions are case sensitive, try
JsonLoader.
On Mon, Sep 23, 2013 at 2:37 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am trying to read simple json data as:
d =LOAD 'json_output' USING
JSONLOADER(('ip:chararray,_id:chararray,cats:[chararray]');
But
, at 11:24 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hi All,
I've been trying to write a Yarn application and I'm completely lost. I'm
using Hadoop 2.0.0-cdh4.4.0 (Cloudera distribution). I've uploaded my
sample code to github at https://github.com/pradeepg26/sample-yarn
The problem
, Pradeep Gollakota pradeep...@gmail.com
wrote:
One thing that comes to mind is that your keys are Strings which are
highly inefficient. You might get a lot better performance if you write a
custom writable for your Key object using the appropriate data types. For
example, use a long
I'm sorry but I don't understand your question. Is the output of the mapper
you're describing the key portion? If it is the key, then your data should
already be sorted by HouseHoldId since it occurs first in your key.
The SortComparator will tell Hadoop how to sort your data. So you use this
if
to do the job but i'm supposed to do this
via a MR job.. So, cannot use either of that.. Do you recommend me to try
something if i have the data in that format?
On Sat, Sep 21, 2013 at 12:26 PM, Pradeep Gollakota
pradeep...@gmail.comwrote:
I'm sorry but I don't understand your question
Be careful with your format definition... it looks like you might have a
typo.
I believe -MM-dd hh:mm:ss is the correct format.
http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html
On Fri, Sep 20, 2013 at 8:26 AM, Ruslan Al-Fakikh metarus...@gmail.comwrote:
Doh!
I think I made a mistake myself...
-MM-dd HH:mm:ss
Since you don't have AM/PM, I'm assuming that your time is 24-hr format.
So, you need to use the 24 hour format symbol of 'H' for hour instead of
'h'.
I really hate time.
On Fri, Sep 20, 2013 at 6:25 PM, Pradeep Gollakota pradeep
Hi All,
I've been trying to write a Yarn application and I'm completely lost. I'm
using Hadoop 2.0.0-cdh4.4.0 (Cloudera distribution). I've uploaded my
sample code to github at https://github.com/pradeepg26/sample-yarn
The problem is that my application master is exiting with a status of 1
(I'm
The problem is that pig only speaks its data types. So you need to tell it
how to translate from your custom writable to a pig datatype.
Apparently elephant-bird has some support for doing this type of thing...
take a look at this SO post
fails
On Mon, Sep 16, 2013 at 6:22 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
The problem is that pig only speaks its data types. So you need to tell
it
how to translate from your custom writable to a pig datatype.
Apparently elephant-bird has some support for doing this type
to write the converters from your types to Pig data types and
pass it into the constructor of the SequenceFileLoader.
Hope this helps!
On Mon, Sep 16, 2013 at 6:56 PM, Pradeep Gollakota pradeep...@gmail.comwrote:
Thats correct...
The load ... AS (k:chararray, v:charrary); doesn't actually do what
insead. Do you knows whats the better choice? TreeMap
or LinkedHashMap?
Anyway thanks :)
2013/9/13 Pradeep Gollakota pradeep...@gmail.com
Thats a great observation John! The problem is that HBaseStorage maps
columns families into a HashMap, so the sort ordering is completely lost.
You
101 - 200 of 254 matches
Mail list logo