Hi everyone,
I am doing a project using Lucene where i need to index HTML files. I am using Tika to parse HTML files. But i need to index files according to their tags which means that every text present in different HTML tag (like <p> <a>) should be stored in different fields. Can i do that. If yes how? Also can i assign different weightage to the tokens present in different fields. If yes how?