Hi,

We're new to ElasticSearch but quite impressed with the tools & community. 
Currently we're using it to build an in-house chat/collaboration solution. 

In a nutshell: it's in-house version of HipChat that's tightly integrated 
with our existing business software.

Currently we're using ES as the back end of the system to store + retrieve 
messages.

Our next challenge is to allow users (and the system) to search for things 
like "@firstname lastname", "firstname lastname", tags like #hashtag or 
file#1234, email addresses, links to files, URLs, employee names, and so 
on. These things aren't possible with the standard tokenizer/analyzer. We 
could simply map the "body" field as "not_analyzed" in the index but then 
we'd lose most searchability.

A special wrinkle here is that username mentions may include spaces - like 
"@Bob Smith." - as well as employee names like "Bob Smith" without the "@" 
prefix.

We're considering several approaches here and could really use 
feedback/critique!

Approach #1: Multiple mappings to the "body" field: one of which uses the 
standard tokenizer/analyzer, and one of which is not_analyzed. We could use 
the not_analyzed version to search on things like @mentions and #hashtags. 
(This would nearly double our storage requirements, right?)

Approach #2: Implementing a custom tokenizer that treats things like 
@username, "@Bob Smith", "Bob Smith" and #hashtags as single tokens so that 
we can search on them later. We'd base it on something like ES's email 
tokenizer: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html

Approach #3: In our application layer (a Rails application) we could parse 
out @usernames before saving the record to ElasticSearch. We could then 
save the @usernames in a separate ES array field. Same for #hashtags and so 
forth. So in addition to the "body" field we'd have a field called 
"mentions", a field called "hashtags", a field called "hyperlinks", and so 
forth.

How would you do it?

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/929cd26b-c17b-4ea1-b9d5-cb415ea037bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to