I don't think this is doable using the out of the box regexp_replace() UDF. That way I would do it, is using a file to create a mapping between a regexp and it's replacement and write a custom UDF that loads this file and applies all regular expressions on the input.
Hope this helps. On Tue, Feb 3, 2015 at 10:46 AM, Viral Parikh <viral.j.par...@gmail.com> wrote: > Hi Everyone, > > I am using hive 0.13! I want to find multiple tokens like "hip hop" and > "rock music" in my data and replace them with "hiphop" and "rockmusic" - > basically replace them without white space. I have used the regexp_replace > function in hive. Below is my query and it works great for above 2 examples. > > drop table vp_hiphop; > create table vp_hiphop asselect userid, ntext, > regexp_replace(regexp_replace(ntext, 'hip hop', 'hiphop'), 'rock > music', 'rockmusic') as ntext1from vp_nlp_protext_males; > > But I have 100 such bigrams/ngrams and want to be able to do replace > efficiently where I just remove the whitespace. I can pattern match the > phrase - hip hop and rock music but in the replace I want to simply trim > the white spaces. Below is what I tried. I also tried using trim with > regexp_replace but it wants the third argument in the regexp_replace > function. > > drop table vp_hiphop; > create table vp_hiphop asselect userid, ntext, > regexp_replace(ntext, '(hip hop)|(rock music)') as ntext1from > vp_nlp_protext_males; > >