Re: Process UnStructured Data in Mahout for Clustering

2014-12-05 Thread Ted Dunning
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh 
wrote:

> i see the problem is with the way data is written


What exactly do you mean by this?


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Brian Dolan
My experience has been that it's best to leave the data processing for Python.  
I strongly suggest you re-write your ETL and let Mahout only do the clustering. 
The built-in vectorization routines are fairly primitive.

Then I would wash the features, basically set up your own list of stop words or 
phrases, before you let Mahout do anything.

On Dec 4, 2014, at 8:38 AM, Shahid Shaikh  wrote:

> Hey Donni thanks but I have used the configurations and obtained the
> clusters .the results are not promising enough . I was looking if there are
> any known technics I can follow specifically while generating vectors .
> 
> Thanks
> 
> On Thursday, December 4, 2014, Donni Khan 
> wrote:
>> Hi
>> it depends on the nature of data you are clustering. If you have knowledge
>> about your data, you can figure out the results and you can also set the
>> correct parameters to the clustering algorithm like number of topics or
>> number of clusters.
>> 
>> Cheers,
>> Donni
>> 
>> On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh 
>> wrote:
>> 
>>> Hi All,
>>>   I have been trying mahout clustering  on unstructured data i.e human
>>> written data . I have tried mahout clustering algorithms like
>>> Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
>>> 
>>> i see the problem is with the way data is written , Can some one please
>>> provide me some pointers on how to proceed with unstructured data  for
>>> clustering.
>>> 
>>> 
>>> i have written and analyzer that uses lower-Case and stop-words filter
> also
>>> .
>>> 
>>> thanks :)
>>> 
>>> 
>>> Regards,
>>> Shaikh Shahid G .
>>> +91 9503954781
>>> 
>> 
> 
> -- 
> Regards,
> Shaikh Shahid G .
> +91 9503954781



Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hey Donni thanks but I have used the configurations and obtained the
clusters .the results are not promising enough . I was looking if there are
any known technics I can follow specifically while generating vectors .

Thanks

On Thursday, December 4, 2014, Donni Khan 
wrote:
> Hi
> it depends on the nature of data you are clustering. If you have knowledge
> about your data, you can figure out the results and you can also set the
> correct parameters to the clustering algorithm like number of topics or
> number of clusters.
>
> Cheers,
> Donni
>
> On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh 
> wrote:
>
>> Hi All,
>>I have been trying mahout clustering  on unstructured data i.e human
>> written data . I have tried mahout clustering algorithms like
>> Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
>>
>> i see the problem is with the way data is written , Can some one please
>> provide me some pointers on how to proceed with unstructured data  for
>> clustering.
>>
>>
>> i have written and analyzer that uses lower-Case and stop-words filter
also
>> .
>>
>> thanks :)
>>
>>
>> Regards,
>> Shaikh Shahid G .
>> +91 9503954781
>>
>

-- 
Regards,
Shaikh Shahid G .
+91 9503954781


Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Hi
it depends on the nature of data you are clustering. If you have knowledge
about your data, you can figure out the results and you can also set the
correct parameters to the clustering algorithm like number of topics or
number of clusters.

Cheers,
Donni

On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh 
wrote:

> Hi All,
>I have been trying mahout clustering  on unstructured data i.e human
> written data . I have tried mahout clustering algorithms like
> Kmeans,Canopy+Kmeans and LDA but the results produced are not help full .
>
> i see the problem is with the way data is written , Can some one please
> provide me some pointers on how to proceed with unstructured data  for
> clustering.
>
>
> i have written and analyzer that uses lower-Case and stop-words filter also
> .
>
> thanks :)
>
>
> Regards,
> Shaikh Shahid G .
> +91 9503954781
>