Hi, We're new to ElasticSearch but quite impressed with the tools & community. Currently we're using it to build an in-house chat/collaboration solution.
In a nutshell: it's in-house version of HipChat that's tightly integrated with our existing business software. Currently we're using ES as the back end of the system to store + retrieve messages. Our next challenge is to allow users (and the system) to search for things like "@firstname lastname", "firstname lastname", tags like #hashtag or file#1234, email addresses, links to files, URLs, employee names, and so on. These things aren't possible with the standard tokenizer/analyzer. We could simply map the "body" field as "not_analyzed" in the index but then we'd lose most searchability. A special wrinkle here is that username mentions may include spaces - like "@Bob Smith." - as well as employee names like "Bob Smith" without the "@" prefix. We're considering several approaches here and could really use feedback/critique! Approach #1: Multiple mappings to the "body" field: one of which uses the standard tokenizer/analyzer, and one of which is not_analyzed. We could use the not_analyzed version to search on things like @mentions and #hashtags. (This would nearly double our storage requirements, right?) Approach #2: Implementing a custom tokenizer that treats things like @username, "@Bob Smith", "Bob Smith" and #hashtags as single tokens so that we can search on them later. We'd base it on something like ES's email tokenizer: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-uaxurlemail-tokenizer.html Approach #3: In our application layer (a Rails application) we could parse out @usernames before saving the record to ElasticSearch. We could then save the @usernames in a separate ES array field. Same for #hashtags and so forth. So in addition to the "body" field we'd have a field called "mentions", a field called "hashtags", a field called "hyperlinks", and so forth. How would you do it? -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/929cd26b-c17b-4ea1-b9d5-cb415ea037bf%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
