Re: Getting file text content from mapper?

2014-11-20 Thread David Pilato
So that’s the expected behavior.
Mapper attachment only index the content but never modify the _source document..

If you want to see extracted text, you need to store the field and explicitly 
ask for it at query time using fields option.

Have a look here: 
https://github.com/elasticsearch/elasticsearch-mapper-attachments#highlighting-attachments
 



-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet  | @elasticsearchfr 
 | @scrutmydocs 




> Le 20 nov. 2014 à 20:14, Raymond Giorgi  a écrit :
> 
> Also, this is the first line of what's posted along the river
> 
> { "index": {"_index":"resumes","_type":"resume","_id":"2158912"}}
> 
> Things can get truncated when they're as big as a Base64 encoded file :)
> 
> 
> On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote:
> Hey all,
> 
> I'm hoping someone can help me out with something I'm having an issue with.
> 
> The short: I'm trying to extract plaintext from the attachment-mapper.
> 
> The long: I'm posting the contents of a file Base64 encoded to RabbitMQ which 
> is feeding an ElasticSearch river plugin. Querying against the field works 
> fine, but it only seems to store the Base64 encoding of the file instead of 
> the plaintext. I'd like to extract the contents as plaintext and have that be 
> returnable (i.e. query for the text of a docx). I'm feeding it from a PHP 
> front end, so there are places in the app where I'd like to rely on 
> Elasticsearch's built in Tika processor.
> 
> Thanks!
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com
>  
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8A848658-E1A7-4192-B66C-104D664C7A66%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Re: Getting file text content from mapper?

2014-11-20 Thread Raymond Giorgi
Also, this is the first line of what's posted along the river

{ "index": {"_index":"resumes","_type":"resume","_id":"2158912"}}

Things can get truncated when they're as big as a Base64 encoded file :)


On Wednesday, November 19, 2014 6:01:29 PM UTC-5, Raymond Giorgi wrote:
>
> Hey all,
>
> I'm hoping someone can help me out with something I'm having an issue with.
>
> The short: I'm trying to extract plaintext from the attachment-mapper.
>
> The long: I'm posting the contents of a file Base64 encoded to RabbitMQ 
> which is feeding an ElasticSearch river plugin. Querying against the field 
> works fine, but it only seems to store the Base64 encoding of the file 
> instead of the plaintext. I'd like to extract the contents as plaintext and 
> have that be returnable (i.e. query for the text of a docx). I'm feeding it 
> from a PHP front end, so there are places in the app where I'd like to rely 
> on Elasticsearch's built in Tika processor.
>
> Thanks!
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/68456ac0-14b9-49f8-a0a0-b930223004f8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: Getting file text content from mapper?

2014-11-19 Thread David Pilato
What is your mapping?
Could you provide a sample JSON document which is sent to RabbitMQ?
How do you know that the content has not been indexed?


David

> Le 20 nov. 2014 à 00:01, Raymond Giorgi  a écrit :
> 
> Hey all,
> 
> I'm hoping someone can help me out with something I'm having an issue with.
> 
> The short: I'm trying to extract plaintext from the attachment-mapper.
> 
> The long: I'm posting the contents of a file Base64 encoded to RabbitMQ which 
> is feeding an ElasticSearch river plugin. Querying against the field works 
> fine, but it only seems to store the Base64 encoding of the file instead of 
> the plaintext. I'd like to extract the contents as plaintext and have that be 
> returnable (i.e. query for the text of a docx). I'm feeding it from a PHP 
> front end, so there are places in the app where I'd like to rely on 
> Elasticsearch's built in Tika processor.
> 
> Thanks!
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/48bd06f6-afcd-4a14-833c-e8316e0a253f%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/22D8A5FE-EE15-4769-BDFD-E0D8B5CA2E6F%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.


Getting file text content from mapper?

2014-11-19 Thread Raymond Giorgi
Hey all,

I'm hoping someone can help me out with something I'm having an issue with.

The short: I'm trying to extract plaintext from the attachment-mapper.

The long: I'm posting the contents of a file Base64 encoded to RabbitMQ 
which is feeding an ElasticSearch river plugin. Querying against the field 
works fine, but it only seems to store the Base64 encoding of the file 
instead of the plaintext. I'd like to extract the contents as plaintext and 
have that be returnable (i.e. query for the text of a docx). I'm feeding it 
from a PHP front end, so there are places in the app where I'd like to rely 
on Elasticsearch's built in Tika processor.

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/48bd06f6-afcd-4a14-833c-e8316e0a253f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.