1) 
As you are a Java dev, I'd recommend using directly Tika in your code and 
extract data as you need and produce JSON which exactly answers to your needs.
Somehow, this: 
https://github.com/dadoonet/fsriver/blob/master/src/main/java/fr/pilato/elasticsearch/river/fs/river/FsRiver.java#L688-L695

That way, you won't need to send a full binary doc to elasticsearch just to 
index some meta data or raw text.

That said, you could look at Source exclude: 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-source-field.html#include-exclude

2)
The mapper attachment never modify source document.
But, if you ask for stored field at search time in addition to default 
"_source" field, you should get back your values.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html#search-request-fields

HTH

-- 
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr


Le 11 juillet 2014 à 15:53:54, David Marko (dmarko...@gmail.com) a écrit:

I'm uploading attachments to be parsed in ES using Java api. I have ES 1.2.2 
with proper elasticsearch-mapper-attachments/ plugin installed. Code works fine 
and I can search by attachment content but  ...

1. File content is stored into elastic search. Is there a way how to avoid 
this? Just to index the content but not store?

I have this mapping code (not full code):

XContentBuilder map = jsonBuilder().startObject()
        .startObject(idxType)
          .startObject("properties")
            .startObject("file")
              .field("type", "attachment")
              .field("store","no")
            .endObject()
          .endObject()
     .endObject();

    and indexing by using this:

BytesReference json = jsonBuilder()
                .startObject()
                    .field("_id", filePath)
                     .field("file", data64)
                .endObject().bytes();
       
IndexResponse idxResp = 
client.prepareIndex().setIndex(idxName).setType(idxType).setId(filePath)

2)  I cant see file metadata created as described in docs. I understand that 
they are (should be) created automaticly ?

Docs says these fields should appear ...

 "fields" : {
                    "file" : {"index" : "no"},
                    "title" : {"store" : "yes"},
                    "date" : {"store" : "yes"},
                    "author" : {"analyzer" : "myAnalyzer"},
                    "keywords" : {"store" : "yes"},
                    "content_type" : {"store" : "yes"},
                    "content_length" : {"store" : "yes"},
                    "language" : {"store" : "yes"}
   }
--
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5a29d66f-99d8-48e4-b93c-7caf61b93214%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/etPan.53c01069.2d1d5ae9.70e%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.

Reply via email to