Hi folks!
I put together this specification for canonicalizing the JSON type in Arrow.
## Introduction
JSON is a widely used text based data interchange format. There are many
use cases where a user has a column whose contents are a JSON encoded
string. BigQuery's [JSON Type][1] and Parquet’s
Pradeep Gollakota created ARROW-17255:
-
Summary: Support JSON logical type in Arrow
Key: ARROW-17255
URL: https://issues.apache.org/jira/browse/ARROW-17255
Project: Apache Arrow
Issue
Pradeep Gollakota created ARROW-17255:
-
Summary: Support JSON logical type in Arrow
Key: ARROW-17255
URL: https://issues.apache.org/jira/browse/ARROW-17255
Project: Apache Arrow
Issue
Gentle thread bump.
On Thu, Jan 18, 2018 at 4:03 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:
> Hi All,
>
> Can one of you review my PR at https://github.com/apache/
> parquet-mr/pull/447 please?
>
> Thanks,
> Pradeep
>
Hi All,
Can one of you review my PR at https://github.com/apache/parquet-mr/pull/447
please?
Thanks,
Pradeep
Luigi,
I strongly urge you to consider a 5 node ZK deployment. I've always done
that in the past for resiliency during maintenance. In a 3 node cluster,
you can only tolerate one "failure", so if you bring one node down for
maintenance and another node crashes during said maintenance, your ZK
Hi All,
It appears that the bottleneck in my job was the EBS volumes. Very high i/o
wait times across the cluster. I was only using 1 volume. Increasing to 4
made it faster.
Thanks,
Pradeep
On Thu, Apr 20, 2017 at 3:12 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:
> Hi All,
&
Hi All,
I have a simple ETL job that reads some data, shuffles it and writes it
back out. This is running on AWS EMR 5.4.0 using Spark 2.1.0.
After Stage 0 completes and the job starts Stage 1, I see a huge slowdown
in the job. The CPU usage is low on the cluster, as is the network I/O.
>From
A single partition can be consumed by at most a single consumer. Consumers
compete to take ownership of a partition. So, in order to gain parallelism
you need to add more partitions.
There is a library that allows multiple consumers to consume from a single
partition
ri, Feb 10, 2017 at 10:17 AM, Lars Volker <l...@cloudera.com> wrote:
> Can you check the value of ParquetMetaData.created_by? Once you have that,
> you should see if it gets filtered by the code in CorruptStatistics.java.
>
> On Fri, Feb 10, 2017 at 7:11 PM, Pradeep Gollak
ed by the
> consumer need to be handled by some other group members."
>
> So does this mean that the consumer should inform the group ahead of
> time before it goes down? Currently, I just shutdown the process.
>
>
> On Fri, Feb 10, 2017 at 8:35 AM, Pradeep Gollakota <pr
statistics are not written to the footer? If you
> used parquet-mr, they may be there but be ignored.
>
> Cheers, Lars
>
> On Fri, Feb 10, 2017 at 5:31 PM, Pradeep Gollakota <pradeep...@gmail.com>
> wrote:
>
> > Bumping the thread to see if I get any responses.
&g
I asked a similar question a while ago. There doesn't appear to be a way to
not triggering the rebalance. But I'm not sure why it would be taking > 1hr
in your case. For us it was pretty fast.
https://www.mail-archive.com/users@kafka.apache.org/msg23925.html
On Fri, Feb 10, 2017 at 4:28 AM,
Bumping the thread to see if I get any responses.
On Wed, Feb 8, 2017 at 6:49 PM, Pradeep Gollakota <pradeep...@gmail.com>
wrote:
> Hi folks,
>
> I generated a bunch of parquet files using spark and
> ParquetThriftOutputFormat. The thirft model has a column called "device
Hi folks,
I generated a bunch of parquet files using spark and
ParquetThriftOutputFormat. The thirft model has a column called "deviceId"
which is a string column. It also has a "timestamp" column of int64. After
the files have been generated, I inspected the file footers and noticed
that only
Volker <l...@cloudera.com> wrote:
> I remember trying to compile with the latest version of thrift shipped in
> Ubuntu 14.04 a few weeks back and got the same error. Using 0.7 worked
> though. Sadly I don't know why it fails on a Mac.
>
> On Feb 8, 2017 21:18, "P
0 -- let us know if you have issues with
> these
>
> Thanks
> Wes
>
> On Wed, Feb 8, 2017 at 2:19 PM, Pradeep Gollakota <pradeep...@gmail.com>
> wrote:
> > Hi folks,
> >
> > I'm trying to build parquet from source. However, the instructions
Pradeep Gollakota created PARQUET-869:
-
Summary: Min/Max record counts for block size checks are not
configurable
Key: PARQUET-869
URL: https://issues.apache.org/jira/browse/PARQUET-869
Project
Usually this kind of thing can be done at a lower level in the InputFormat
usually by specifying the max split size. Have you looked into that
possibility with your InputFormat?
On Sun, Jan 15, 2017 at 9:42 PM, Fei Hu wrote:
> Hi Jasbir,
>
> Yes, you are right. Do you have
and reassigns it to another member of
> the group. This happens once and then the "issue" is resolved without any
> additional interruptions.
>
> -Ewen
>
> On Thu, Jan 5, 2017 at 3:01 PM, Pradeep Gollakota <pradeep...@gmail.com>
> wrote:
>
&
Hi Kafka folks!
When a consumer is closed, it will issue a LeaveGroupRequest. Does anyone
know how long the coordinator waits before reassigning the partitions that
were assigned to the leaving consumer to a new consumer? I ask because I'm
trying to understand the behavior of consumers if you're
Worked for me if I go to https://spark.apache.org/site/ but not
https://spark.apache.org
On Wed, Jul 13, 2016 at 11:48 AM, Maurin Lenglart
wrote:
> Same here
>
>
>
> *From: *Benjamin Kim
> *Date: *Wednesday, July 13, 2016 at 11:47 AM
> *To: *manish
Just out of curiosity, if you guys are in AWS for everything, why not use
Kinesis?
On Tue, Jun 28, 2016 at 3:49 PM, Charity Majors wrote:
> Hi there,
>
> I just finished implementing kafka + autoscaling groups in a way that made
> sense to me. I have a _lot_ of experience
IIRC, TextInputFormat supports an input path that is a comma separated
list. I haven't tried this, but I think you should just be able to do
sc.textFile("file1,file2,...")
On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang wrote:
> I know these workaround, but wouldn't it be more
Looks like what I was suggesting doesn't work. :/
On Wed, Nov 11, 2015 at 4:49 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> Yes, that's what I suggest. TextInputFormat support multiple inputs. So in
> spark side, we just need to provide API to for that.
>
> On Thu, Nov 12, 2015 a
At Lithium, we have multiple datacenters and we distcp our data across our
Hadoop clusters. We have 2 DCs in NA and 1 in EU. We have a non-redundant
direct connect from our EU cluster to one of our NA DCs. If and when this
fails, we have automatic failover to a VPN that goes over the internet. The
t 2015 02:02, "James Cheng" <jch...@tivo.com> wrote:
> > >
> > >> Here’s an article that Gwen wrote earlier this year on handling large
> > >> messages in Kafka.
> > >>
> > >> http://ingest.tips/2015/01/21/handling-large-message
Fellow Kafkaers,
We have a pretty heavyweight legacy event logging system for batch
processing. We're now sending the events into Kafka now for realtime
analytics. But we have some pretty large messages (> 40 MB).
I'm wondering if any of you have use cases where you have to send large
messages
To add a little more context to Shaun's question, we have around 400
customers. Each customer has a stream of events. Some customers generate a
lot of data while others don't. We need to ensure that each customer's data
is sorted globally by timestamp.
We have two use cases around consumption:
Hi all,
I have an external table of with the following DDL.
```
DROP TABLE IF EXISTS raw_events;
CREATE EXTERNAL TABLE IF NOT EXISTS raw_events (
raw_event_string string)
PARTITIONED BY (dc string, community string, dt string)
STORED AS TEXTFILE
LOCATION
iveInputFormat not working
>
>
>
> what are your values for:
>
> mapred.min.split.size
>
> mapred.max.split.size
>
> hive.hadoop.supports.splittable.combineinputformat
>
>
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep...@gmail.com]
> *Sent:* Wed
n your hadoop distro and version, be potentially aware of
>
> https://issues.apache.org/jira/browse/MAPREDUCE-1597
>
> and
>
> https://issues.apache.org/jira/browse/MAPREDUCE-5537
>
>
>
> test it and see...
>
>
>
> *From:* Pradeep Gollakota [mailto:pradeep.
actual partitions
in the table but simply partitioned data in hdfs give it a shot. It may be
worthwhile looking into optimizations for this use case.
-Slava
On Thu, Jun 11, 2015 at 11:56 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hi All,
I have a table which is partitioned on two
I actually decided to remove one of my 2 partition columns and make it a
bucketing column instead... same query completed fully in under 10 minutes
with 92 partitions added. This will suffice for me for now.
On Thu, Jun 11, 2015 at 2:25 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
Hmm
Hi All,
I have a table which is partitioned on two columns (customer, date). I'm
loading some data into the table using a Hive query. The MapReduce job
completed within a few minutes and needs to commit the data to the
appropriate partitions. There were about 32000 partitions generated. The
:37 PM, Pradeep Gollakota
pradeep...@gmail.com wrote:
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the
job is generating too many splits. I don't have this problem when running
queries in Hive since it combines splits by default.
Is there an equivalent in MR
Hi All,
I'm writing an MR job to read data using HCatInputFormat... however, the
job is generating too many splits. I don't have this problem when running
queries in Hive since it combines splits by default.
Is there an equivalent in MR so that I'm not generating thousands of
mappers?
Thanks,
Also, mapred job -kill job_id
On Sun, Apr 12, 2015 at 11:07 AM, Shahab Yunus shahab.yu...@gmail.com
wrote:
You can kill t by using the following yarn command
yarn application -kill application id
https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html
Or use
If I understood your question correctly, you want to be able to read the
output of Camus in Hive and be able to know partition values. If my
understanding is right, you can do so by using the following.
Hive provides the ability to provide custom patterns for partitions. You
can use this in
Apparently I joined this list at the right time :P
On Sat, Feb 7, 2015 at 4:40 PM, Jay Kreps jay.kr...@gmail.com wrote:
I closed about 350 redundant or obsolete issues. If I closed an issue you
think is not obsolete, my apologies, just reopen.
-Jay
[
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310241#comment-14310241
]
Pradeep Gollakota commented on KAFKA-1884:
--
[~guozhang] That's what I figured
[
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310548#comment-14310548
]
Pradeep Gollakota commented on KAFKA-1884:
--
I guess that makes sense... I'll
[
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308670#comment-14308670
]
Pradeep Gollakota commented on KAFKA-1884:
--
What makes the behavior in #2 earlier
[
https://issues.apache.org/jira/browse/KAFKA-1884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308539#comment-14308539
]
Pradeep Gollakota commented on KAFKA-1884:
--
I'd like to work on this. Please
Lithium Technologies would love to host you guys for a release party in SF
if you guys want.
:)
On Tue, Feb 3, 2015 at 11:04 AM, Gwen Shapira gshap...@cloudera.com wrote:
When's the party?
:)
On Mon, Feb 2, 2015 at 8:13 PM, Jay Kreps jay.kr...@gmail.com wrote:
Yay!
-Jay
On Mon,
Lithium Technologies would love to host you guys for a release party in SF
if you guys want.
:)
On Tue, Feb 3, 2015 at 11:04 AM, Gwen Shapira gshap...@cloudera.com wrote:
When's the party?
:)
On Mon, Feb 2, 2015 at 8:13 PM, Jay Kreps jay.kr...@gmail.com wrote:
Yay!
-Jay
On Mon,
I don't think this is doable using the out of the box regexp_replace() UDF.
That way I would do it, is using a file to create a mapping between a
regexp and it's replacement and write a custom UDF that loads this file and
applies all regular expressions on the input.
Hope this helps.
On Tue, Feb
This is a great question Otis. Like Gwen said, you can accomplish Sync mode
by setting the batch size to 1. But this does highlight a shortcoming of
the new producer API.
I really like the design of the new API and it has really great properties
and I'm enjoying working with it. However, once API
it to work?
Gwen
On Mon, Feb 2, 2015 at 1:38 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
This is a great question Otis. Like Gwen said, you can accomplish Sync
mode
by setting the batch size to 1. But this does highlight a shortcoming of
the new producer API.
I really like the design
Hi Bhavesh,
At Lithium, we don't run Camus in our pipelines yet, though we plan to. But
I just wanted to comment regarding speculative execution. We have it
disabled at the cluster level and typically don't need it for most of our
jobs. Especially with something like Camus, I don't see any need
with the CSV where the f1
is stored as a string ?The CSV data would look like this -
*10,abc20,xyz30,lmn...
etc ***
Thanks,Amit
On Monday, February 2, 2015 3:37 AM, Pradeep Gollakota
pradeep
Just to clarify, do you have a semicolon after f1 20?
A = LOAD 'data' USING PigStorage(',');
B = FOREACH A GENERATE f1;
C = FILTER B BY f1 20;
DUMP C;
This should be correct.
On Sun, Feb 1, 2015 at 4:50 PM, Amit am...@yahoo.com.invalid wrote:
Hello,I am trying to run a Ad-hoc pig script
Hi All,
I'm trying to establish a good pattern and practice with Oozie for sharing
a setup file for pig/hive. For example, I have several scripts that use a
set of UDFs that are built in-house. In order to use the UDF, I need to add
the jar file, and then register the UDF. Rather than repeating
Viswanath
On 16-Jan-2015, at 12:34, Pradeep Gollakota pradeep...@gmail.com
wrote:
Just out of curiosity, why are you using SET to set the solr collection?
I'm not sure if you're using an out of the box Load/Store Func, but if I
were to design it, I would use the location of a Load/Store
If you're using maven AND using surefire plugin 2.7.3+ AND using Junit 4,
then you can do this by specifying -Dtest=TestClass#methodName
ref:
http://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html
On Thu, Jan 15, 2015 at 8:02 PM, Cheolsoo Park piaozhe...@gmail.com
Just out of curiosity, why are you using SET to set the solr collection?
I'm not sure if you're using an out of the box Load/Store Func, but if I
were to design it, I would use the location of a Load/Store Func to
specify which solr collection to write to.
Is it possible for you to redesign this
If you're using maven AND using surefire plugin 2.7.3+ AND using Junit 4,
then you can do this by specifying -Dtest=TestClass#methodName
ref:
http://maven.apache.org/surefire/maven-surefire-plugin/examples/single-test.html
On Thu, Jan 15, 2015 at 8:02 PM, Cheolsoo Park piaozhe...@gmail.com
is not possible with Akka 2.3.x.
It might be supported later, see https://github.com/akka/akka/issues/13961
and https://github.com/akka/akka/issues/13964
Regards,
Patrik
On Tue, Dec 30, 2014 at 8:58 PM, Pradeep Gollakota prade...@gmail.com
javascript: wrote:
Hi All,
I’m trying to create
Hi All,
I’m trying to create an ActorSystem where a set of actors have a shared
mailbox that’s prioritized. I’ve tested my mailbox without using the
BalancingPool router, and the messages are correctly prioritized. However,
when I try to create the actors using BalancingPool, the messages
@Joe, Achanta is using Indian English numerals which is why it's a little
confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system
1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The
rest of the world :P)
On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash
Hi Aaron,
Just out of curiosity, have you considered using asynchbase?
https://github.com/OpenTSDB/asynchbase
On Fri, Dec 19, 2014 at 9:00 AM, Nick Dimiduk ndimi...@apache.org wrote:
Hi Aaron,
Your analysis is spot on and I do not believe this is by design. I see the
write buffer is owned
This doesn't answer your question per se, but this is how we dealt with
load on HBase at Lithium. We power klout.com with HBase. On a nightly
basis, we load user profile data and Klout scores for approx. 600 million
users into HBase. We also do maintenance on HBase such as major compactions
on a
Lithium (Klout) powers www.klout.com with HBase. The operations team is 2
full time engineers + the manager (who also does hands on operations work
with the team). This operations team is responsible for the entirety of our
Hadoop stack including the HBase clusters. We have one 165 node Hive
Java string's are immutable. So pdfText.concat() returns a new string and
the original string is left unmolested. So at the end, all you're doing is
returning an empty string. Instead, you can do pdfText =
pdfText.concat(...). But the better way to write it is to use a
StringBuilder.
it.
- Pradeep
On Fri Dec 05 2014 at 9:18:16 AM Pradeep Gollakota pradeep...@gmail.com
wrote:
Java string's are immutable. So pdfText.concat() returns a new string
and the original string is left unmolested. So at the end, all you're doing
is returning an empty string. Instead, you can do pdfText
how to best do things within the
Pig/MapReduce/Hadoop framework
Ryan
On Fri, Dec 5, 2014 at 1:35 PM, Ryan freelanceflashga...@gmail.com
wrote:
Thanks Pradeep! I'll give it a try and report back
Ryan
On Fri, Dec 5, 2014 at 12:30 PM, Pradeep Gollakota pradeep...@gmail.com
wrote
There is a built in storage handler for HBase. Take a look at the docs at
https://pig.apache.org/docs/r0.14.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
It doesn't support dealing with salted rowkeys (or reverse timestamps) out
of the box, so you may have to munge with the data a
Can you expand on your use case a little bit please? It may be that you're
duplicating functionality.
You can take a look at the CombineFileInputFormat for inspiration. If this
is indeed taking a long time, one cheap to implement thing you can do is to
parallelize the calls to get block
(fieldDescriptor.toProto());
}
}
}
On Saturday, November 1, 2014 4:26:35 AM UTC-7, Oliver wrote:
On 1 November 2014 02:24, Pradeep Gollakota prade...@gmail.com
javascript: wrote:
Confirmed... When I replaced the md variable with the compiled
Descriptor
, October 30, 2014 2:41:19 PM UTC-7, Oliver wrote:
On 30 October 2014 02:53, Pradeep Gollakota prade...@gmail.com
javascript: wrote:
I have a use case where I need to parse messages without having the
corresponding precompiled classes in Java. So the DynamicMessage seems
, then I'd go with
using the parsed descriptors as your format description, not the text
.proto file.
Oliver
On 31 October 2014 17:56, Pradeep Gollakota prade...@gmail.com
javascript: wrote:
Hi Oliver,
Thanks for the response! I guess my question wasn't quite clear. In my
java
on this?
Thanks again all!
On Friday, October 31, 2014 2:18:44 PM UTC-7, Ilia Mirkin wrote:
At no point are you specifying that you want to use the
MessagePublish descriptor, so you must still be using the API
incorrectly...
On Fri, Oct 31, 2014 at 5:10 PM, Pradeep Gollakota prade...@gmail.com
been
annotated with the (isPii = true) option.
On Friday, October 31, 2014 3:25:51 PM UTC-7, Ilia Mirkin wrote:
On Fri, Oct 31, 2014 at 6:18 PM, Pradeep Gollakota prade...@gmail.com
javascript: wrote:
Boolean extension =
fieldDescriptor.getOptions().getExtension(Messages.isPii
try to mix pregenerated code
dynamically loaded descriptors, all sorts of things break.
Oliver
On 1 November 2014 00:48, Pradeep Gollakota pradeep...@gmail.com wrote:
Not really... one of the use cases I'm trying to solve for is an
anonymization use case. We will have several app's writing
Hi Protobuf gurus,
I'm trying to parse a .proto file in Java to use with DynamicMessages. Is
this possible or does it have to be compiled to a descriptor set file
first before this can be done?
I have a use case where I need to parse messages without having the
corresponding precompiled
At Lithium, we power Klout using HBase. We load Klout scores for about 500
million users into HBase every night. When a load is happening, we noticed
that the performance of klout.com was severely degraded. We also see
severely degraded performance when performing operations like compactions.
In
This is a great question!
I could be wrong, but I don't believe there is a way to indicate this for a
group-by. It definitely does matter for performance if your input is
globally sorted. Currently a group by happens on reduce side. But if the
input is globally sorted, this can happen map side
Hi Ankur,
Is the list of regular expressions static or dynamic? If it's a static
list, you can collapse all the filter operators into a single operator and
use the AND keyword to combine them.
E.g.
Filtered_Data = FILTER BagName BY ($0 matches 'RegEx-1') AND ($0 matches
'RegEx-2') AND ($0
In case you haven't seen this already, take a look at
http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies on
optimizing your pig scripts.
On Mon, Oct 6, 2014 at 1:08 PM, Russell Jurney russell.jur...@gmail.com
wrote:
Actually, I don't think you need SelectFieldByValue. Just
suggestions.
Sorry for not being clear with specification at first place.
Thanks.
On Mon, Oct 6, 2014 at 4:12 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:
In case you haven't seen this already, take a look at
http://pig.apache.org/docs/r0.13.0/perf.html for some basic strategies
I agree with the answers suggested above.
3. B
4. D
5. C
On Mon, Oct 6, 2014 at 2:58 PM, Ulul had...@ulul.org wrote:
Hi
No, Pig is a data manipulation language for data already in Hadoop.
The question is about importing data from OLTP DB (eg Oracle, MySQL...) to
Hadoop, this is what Sqoop
Looks like you're facing the same problem as this SO.
http://stackoverflow.com/questions/10705140/hadoop-datanode-fails-to-start-throwing-org-apache-hadoop-hdfs-server-common-sto
Try the suggested fix.
On Fri, Oct 3, 2014 at 6:57 PM, Colin Kincaid Williams disc...@uw.edu
wrote:
We had a
It appears to be randomly chosen. I just came across this blog post from
Lars George about HBase file locality in HDFS
http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html
On Thu, Oct 2, 2014 at 4:12 PM, SF Hadoop sfhad...@gmail.com wrote:
What is the block placement policy hadoop
Can you post the client code you're using to read/write from HBase?
On Wed, Aug 13, 2014 at 11:21 AM, kacperolszewski kacperolszew...@o2.pl
wrote:
Hello there, I'm running a read/write benchmark on a huge data (tweeter
posts) for my school project.
The problem im dealing with is that the
I think there's a problem with your schema.
{DataASet: (A1: int,A2: int,DataBSets: {DataBSet: (B1: chararray,B2:
chararray)})}
should probably look like
{DataASet: (A1: int,A2: int,DataBSets: {(DataBSet: (B1: chararray,B2:
chararray))})}
On Thu, Aug 7, 2014 at 11:22 AM, Klüber, Ralf
-
Von: Pradeep Gollakota [mailto:pradeep...@gmail.com]
Gesendet: Friday, August 08, 2014 2:21 PM
An: user@pig.apache.org
Betreff: Re: Json Loader - Array of objects - Loading results in empty
data set
I think there's a problem with your schema.
{DataASet: (A1: int,A2: int,DataBSets
Hi All,
Is it possible to do a rolling upgrade from Hadoop 2.2 to 2.4?
Thanks,
Pradeep
Hi All,
I was watching the talks from the Kafka meet up at LinkedIn last month.
While answering a question on producers spilling to disk, Neha mentioned
that there was a Go client that had this feature. I was wondering if the
client that does this is https://github.com/Shopify/sarama/issues.
I'm
i. Equals can be mimicked by specifying both = and = (i.e. -lte=123
-gte=123)
ii. What do you mean by taking a partial rowkey? the lte and gte are
partial matches.
On Mon, Jun 30, 2014 at 10:24 PM, Nivetha K nivethak3...@gmail.com wrote:
Hi,
Iam working with Pig.
I need to know some
consider my rowkeys are
123456,123678,123678,124789,124789.. i need to take the rowkeys
starts with 123
On 1 July 2014 11:36, Pradeep Gollakota pradeep...@gmail.com wrote:
i. Equals can be mimicked by specifying both = and = (i.e. -lte=123
-gte=123)
ii. What do you mean
I'm actually not convinced that encryption needs to be handled server side
in Kafka. I think the best solution for encryption is to handle it
producer/consumer side just like compression. This will offload key
management to the users and we'll still be able to leverage the sendfile
optimization
, Pradeep Gollakota pradeep...@gmail.com
wrote:
Ambari has a concept of custom stacks. So, you can write a custom stack
to deploy Druid. At installation time, you can choose to install your Druid
stack but not the Hadoop stack.
On Wed, Jun 4, 2014 at 9:21 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com
[
https://issues.apache.org/jira/browse/AMBARI-5707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017422#comment-14017422
]
Pradeep Gollakota commented on AMBARI-5707:
---
I too agree that it may
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe drah...@googlemail.com wrote:
Hi All,
I have imported hive table into pig having a complex data type
(ARRAYString). The alias in pig looks as below
grunt describe A;
Disregard last email.
Sorry... didn't fully understand the question.
On Mon, Jun 2, 2014 at 8:44 AM, Pradeep Gollakota pradeep...@gmail.com
wrote:
FOREACH A GENERATE cust_id, cust_name, FLATTEN(cust_address), cust_email;
On Sun, Jun 1, 2014 at 5:54 PM, Rahul Channe drah
There was a similar question as this on StackOverflow a while back. The
suggestion was to write a custom BagToTuple UDF.
http://stackoverflow.com/questions/18544602/how-to-flatten-a-group-into-a-single-tuple-in-pig
On Mon, Jun 2, 2014 at 8:46 AM, Pradeep Gollakota pradeep...@gmail.com
wrote
,florida)
grunt describe B;
B: {org.apache.pig.builtin.bagtotuple_cust_address_34::innerfield:
chararray}
I am not able to seperate the fields in B as $0,$1 and $3 ,tried using
STRSPLIT but didnt work.
On Mon, Jun 2, 2014 at 11:50 AM, Pradeep Gollakota pradeep...@gmail.com
wrote
BagToTuple(cust_address);
grunt describe B;
B: {org.apache.pig.builtin.bagtotuple_cust_address_24: (innerfield:
chararray)}
grunt dump B;
((2200,benjamin franklin,philadelphia))
((44,atlanta franklin,florida))
On Mon, Jun 2, 2014 at 12:59 PM, Pradeep Gollakota pradeep
Hortonworks has written a bridge tool to help with this. As far as I know,
this will only work for replicating from a 0.94 cluster to a 0.96 cluster.
Check out https://github.com/hortonworks/HBaseReplicationBridgeServer
On Mon, Jun 2, 2014 at 7:35 AM, yanivG yaniv.yancov...@gmail.com wrote:
@Mehmet... great hack! I like it :-P
On Tue, May 27, 2014 at 5:08 PM, Mehmet Tepedelenlioglu
mehmets...@yahoo.com wrote:
If you know how many items you want from each inner bag exactly, you can
hack it like this:
x = foreach x {
y = foreach x generate RANDOM() as rnd, *;
y =
1 - 100 of 254 matches
Mail list logo