Re: [nupic-discuss] Food Classification/ Ingredient Prediction

Julie Pitt Wed, 16 Apr 2014 12:30:07 -0700

Matt, I'm curious, how do you intend to extract ingredient information from
the various recipe websites and databases? Generally there are numerous
representations of practically the same ingredient (e.g., "broccoli, raw",
or "fresh broccoli"). How similar is "cumin seed" and "cumin, ground"?
Also, there's sometimes a fine line between the ingredient itself and how
it is prepared ("tomato, diced", "cubed tomato"). How would you separate
those two?


Do you plan to build in understanding of how an ingredient is prepared or
what form it comes in (e.g., fresh or canned) into the model? It would be
interesting to get ingredient substitutions as well.


On Fri, Apr 11, 2014 at 10:18 AM, Matt Roesener <[email protected]> wrote:

> Hi Chetan,
>
> Thank you, I will experiment with Fluent a bit more!
>
> I'm curious to know if you or anyone else see any issues with my thought
> process below?
>
>
> 1. Scrape ingredients data from AllRecipes.com, recipe databases, etc.
>
>
> 2. For each ingredient in recipe send ingredient to CEPT API
>
>
> 3. Pull SDR from CEPT API
>
>
> 4. Send SDR of word into Temporal Pooler
>
>
> 5. Learn sequences of ingredients
>
>
> 6. Make 1, 2, 3, 4, 5 step predictions of ingredients
>
>
> Whats interesting to me is certain ingredients are semantically similar by
> using the SDR from CEPT.
>
>
> Thanks,
>
> Matt
>
>
> On Thu, Apr 10, 2014 at 5:44 AM, <[email protected]> wrote:
>
>> Send nupic mailing list submissions to
>>         [email protected]
>>
>> To subscribe or unsubscribe, visit
>>         http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> or, via email, send a message with subject or body 'help' to
>>         [email protected]
>>
>> You can reach the person managing the list at
>>         [email protected]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of nupic digest..."
>>
>>
>> Today's Topics:
>>
>>    1. Inhibition radius and SDR formation (Azat)
>>    2. Re: Food Classification/ Ingredient Prediction (Chetan Surpur)
>>    3. Re: Swarming process using a large amount of memory
>>       (Scheele, Manuel)
>>    4. Re: Swarming process using a large amount of memory (Marek Otahal)
>>    5. Re: nupic Digest, Vol 12, Issue 20 (Pedro Tabacof)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Wed, 9 Apr 2014 21:55:26 -0700 (PDT)
>> From: Azat <[email protected]>
>> To: [email protected]
>> Subject: [nupic-discuss] Inhibition radius and SDR formation
>> Message-ID:
>>         <[email protected]>
>> Content-Type: text/plain; charset=us-ascii
>>
>> Hello,
>>
>>   Got some questions on the SDR formation (at initial stages of learning
>> in a CLA region) and also predictions based on already available SDRs
>> (later after some significant learning time):
>>
>> Is my understanding correct that "proper" inhibition radius in the
>> algorithm is (one of?) the most important parameters in building a new good
>> SDR for some novel input ?
>> Now it's chosen *so that* we get meaningful SDRs which can be used later
>> for prediction that make sense for us (meaning close to real world
>> expectation) ?
>>
>> [Leaving out for now other mechanisms like thresholds, bursting, number
>> of proximal connections etc - let's suppose they're all given and fixed]
>>
>> Azat
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Wed, 9 Apr 2014 22:17:26 -0700
>> From: Chetan Surpur <[email protected]>
>> To: "NuPIC general mailing list." <[email protected]>
>> Subject: Re: [nupic-discuss] Food Classification/ Ingredient
>>         Prediction
>> Message-ID:
>>         <CAD1_crnfc625XNaqcJw2KdXgtbPafx8p7ang=
>> [email protected]>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>>
>> Hi Matt,
>>
>> This indeed sounds like a great fit for NuPIC and CEPT. Check out Fluent
>> (webapp [1], server-based API [2], Python library [3]), it was made for
>> this!
>>
>> [1] http://fluent.cept.at/
>> [2] https://github.com/numenta/nupic.fluent.server
>> [3] https://github.com/numenta/nupic.fluent
>>
>> - Chetan
>>
>>
>> On Wed, Apr 9, 2014 at 6:12 PM, Matt Roesener <[email protected]>
>> wrote:
>>
>> > Hello everyone,
>> >
>> > While experimenting with the NLP projects using CEPT and the temporal
>> > pooler, I came across an idea. Could I feed into the temporal pooler
>> > ingredients data such as "tomato, onion..." or "eggs, bacon, potatoes",
>> > learn the sequences, and make predictions of the next ingredient? My
>> > thinking is to learn food associations from thousands of other food
>> recipes
>> > but also create new food associations or recipes?
>> >
>> > Is this this logic reasonable to accomplish using Nupic or will
>> additional
>> > regions be needed?
>> >
>> > Any thought process would be much appreciated!
>> >
>> > Thank you,
>> >
>> > Matt
>> >
>> >
>> >
>> > _______________________________________________
>> > nupic mailing list
>> > [email protected]
>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >
>> >
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140409/dd685d22/attachment-0001.html
>> >
>>
>> ------------------------------
>>
>> Message: 3
>> Date: Thu, 10 Apr 2014 10:31:05 +0000
>> From: "Scheele, Manuel" <[email protected]>
>> To: NuPIC general mailing list. <[email protected]>
>> Subject: Re: [nupic-discuss] Swarming process using a large amount of
>>         memory
>> Message-ID:
>>         <[email protected]>
>> Content-Type: text/plain; charset="us-ascii"
>>
>> Hi Ritchie,
>>
>> Unfortunately, I can't help you as to why your swarm uses up so much
>> memory, but I have swarmed over data files with sizes of 8MB without
>> problems (it only takes some time, as you would expect). I have 6GB of RAM.
>> According to the resources monitor the swarming process uses about 320MB
>> initially and grows to a total of 460MB.
>>
>> However, I don't think file size matters too much when swarming. What is
>> more relevant is the number of fields in your file and the two things are
>> not necessarily connected (a field can have any byte size, so a large file
>> may not indicate a large number of fields). But I am not too confident
>> about this. Let's see what the rest of the community has to say about it ;).
>>
>> A workaround would for now would be to limit the number of lines you
>> swarm over (in the search_def.json), but that is equivalent to swarming
>> over a smaller file.
>>
>> Manuel
>>
>>
>>
>>
>> ________________________________________
>> From: nupic [[email protected]] on behalf of Ritchie Lee [
>> [email protected]]
>> Sent: 09 April 2014 22:07
>> To: [email protected]
>> Subject: [nupic-discuss] Swarming process using a large amount of memory
>>
>> Hi friends of NuPIC!
>>
>> I have been running swarms on csv data files that are around 3 megabytes
>> in size, and I have found that it is using about >6 gigabytes of RAM during
>> the process.  If I run swarms on data files that are larger than that, my
>> computer runs out of RAM and hangs (I have 8 GB of RAM).  In particular
>> I've tried swarming on a 13 MB data file and it froze very quickly.  Memory
>> usage seems to climb monotonically during the swarming process, and
>> released all at the end on completion.
>>
>> I am wondering if anyone has had experience swarming large (>10 MB) csv
>> files and your experiences with the memory consumption.  Ideally I'd like
>> to be able to swarm over much larger datasets (on the order of a hundred
>> megs).
>>
>> Thanks,
>>
>> Ritchie Lee
>> Research Engineer
>> Carnegie Mellon University-Silicon Valley
>> NASA Ames Research Center
>> Bldg 23, Rm 115
>> Moffett Field, CA 94035
>> (650) 335-2847
>>
>>
>> ------------------------------
>>
>> Message: 4
>> Date: Thu, 10 Apr 2014 13:21:03 +0200
>> From: Marek Otahal <[email protected]>
>> To: "NuPIC general mailing list." <[email protected]>
>> Subject: Re: [nupic-discuss] Swarming process using a large amount of
>>         memory
>> Message-ID:
>>         <
>> cach1_rq8kvp0gwwcprdexr+jb7pqv_m_tgylolf1y9npiwe...@mail.gmail.com>
>> Content-Type: text/plain; charset="utf-8"
>>
>> Hi Ritchie,
>>
>> I did some experiments (not swarming) with large scale CLAs, it turns out
>> python objects take a lot of resources.
>> Your memory demands would depend on: encoders you use, the size and type
>> of
>> input data, and implementation of spatial pooler and temporal pooler
>> (currently we have 2 - python and cpp). Could you share these details or
>> best the "OPF settings file" where you describe the structure of data to
>> Nupic (description.py or something)?
>>
>> Cheers,
>> Mark
>>
>>
>> On Thu, Apr 10, 2014 at 12:31 PM, Scheele, Manuel <
>> [email protected]> wrote:
>>
>> > Hi Ritchie,
>> >
>> > Unfortunately, I can't help you as to why your swarm uses up so much
>> > memory, but I have swarmed over data files with sizes of 8MB without
>> > problems (it only takes some time, as you would expect). I have 6GB of
>> RAM.
>> > According to the resources monitor the swarming process uses about 320MB
>> > initially and grows to a total of 460MB.
>> >
>> > However, I don't think file size matters too much when swarming. What is
>> > more relevant is the number of fields in your file and the two things
>> are
>> > not necessarily connected (a field can have any byte size, so a large
>> file
>> > may not indicate a large number of fields). But I am not too confident
>> > about this. Let's see what the rest of the community has to say about
>> it ;).
>> >
>> > A workaround would for now would be to limit the number of lines you
>> swarm
>> > over (in the search_def.json), but that is equivalent to swarming over a
>> > smaller file.
>> >
>> > Manuel
>> >
>> >
>> >
>> >
>> > ________________________________________
>> > From: nupic [[email protected]] on behalf of Ritchie Lee
>> [
>> > [email protected]]
>> > Sent: 09 April 2014 22:07
>> > To: [email protected]
>> > Subject: [nupic-discuss] Swarming process using a large amount of memory
>> >
>> > Hi friends of NuPIC!
>> >
>> > I have been running swarms on csv data files that are around 3 megabytes
>> > in size, and I have found that it is using about >6 gigabytes of RAM
>> during
>> > the process.  If I run swarms on data files that are larger than that,
>> my
>> > computer runs out of RAM and hangs (I have 8 GB of RAM).  In particular
>> > I've tried swarming on a 13 MB data file and it froze very quickly.
>>  Memory
>> > usage seems to climb monotonically during the swarming process, and
>> > released all at the end on completion.
>> >
>> > I am wondering if anyone has had experience swarming large (>10 MB) csv
>> > files and your experiences with the memory consumption.  Ideally I'd
>> like
>> > to be able to swarm over much larger datasets (on the order of a hundred
>> > megs).
>> >
>> > Thanks,
>> >
>> > Ritchie Lee
>> > Research Engineer
>> > Carnegie Mellon University-Silicon Valley
>> > NASA Ames Research Center
>> > Bldg 23, Rm 115
>> > Moffett Field, CA 94035
>> > (650) 335-2847
>>
>> > _______________________________________________
>> > nupic mailing list
>> > [email protected]
>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >
>>
>>
>>
>> --
>> Marek Otahal :o)
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140410/fcdaffd9/attachment-0001.html
>> >
>>
>> ------------------------------
>>
>> Message: 5
>> Date: Thu, 10 Apr 2014 09:44:03 -0300
>> From: Pedro Tabacof <[email protected]>
>> To: "NuPIC general mailing list." <[email protected]>
>> Subject: Re: [nupic-discuss] nupic Digest, Vol 12, Issue 20
>> Message-ID:
>>         <
>> caakowudzi30hzqx0m11p-lmr2aj5b-rw2jdmqrcqz7bsyep...@mail.gmail.com>
>> Content-Type: text/plain; charset="iso-8859-1"
>>
>> Ritchie, those are interesting results. I remember reading an "Anomaly
>> Memo" on the Wiki a long time ago (it's no longer there) where some
>> problems with the anomaly score were discussed, and I believe there was
>> something related to the problem you found.
>>
>> What exactly do we lose by setting maxBoost to 1?
>>
>> Regards,
>> Pedro.
>>
>>
>> On Wed, Apr 9, 2014 at 11:55 PM, Ritchie Lee <[email protected]>
>> wrote:
>>
>> > I did some testing and found that it is description.py that actually
>> > matters.  I believe model_params.py is produced for external clients.
>>  With
>> > Subutai's suggestion, setting maxBoost = 1.0 solved it, so boosting is
>> > doing something funny to the anomaly score.  Works well with boosting
>> > disabled.
>> >
>> > Ritchie
>> >
>> > [image: Inline image 1]
>> >
>> > Ritchie Lee
>> > Research Engineer
>> >  Carnegie Mellon University-Silicon Valley
>> > NASA Ames Research Center
>> > Bldg 23, Rm 115
>> > Moffett Field, CA 94035
>> > (650) 335-2847
>> >
>> >
>> > On Wed, Apr 9, 2014 at 7:44 PM, Ritchie Lee <[email protected]
>> >wrote:
>> >
>> >> Hi Subutai,
>> >>
>> >> Which file should I add that field to? Is it model_params.py? or
>> >> description.py? or something under savedmodels/?
>> >>
>> >> Thanks,
>> >>
>> >> Ritchie Lee
>> >> Research Engineer
>> >> Carnegie Mellon University-Silicon Valley
>> >> NASA Ames Research Center
>> >> Bldg 23, Rm 115
>> >> Moffett Field, CA 94035
>> >> (650) 335-2847
>> >>
>> >>
>> >>
>> >>>
>> >>> Message: 2
>> >>> Date: Wed, 9 Apr 2014 17:20:23 -0700
>> >>> From: Subutai Ahmad <[email protected]>
>> >>> To: "NuPIC general mailing list." <[email protected]>
>> >>> Subject: Re: [nupic-discuss] Oddity in sine wave example
>> >>> Message-ID:
>> >>>         <
>> >>> ca+zatijydecwuwrprpendxvdvpsitouixnjxs8kxazmhta+...@mail.gmail.com>
>> >>> Content-Type: text/plain; charset="iso-8859-1"
>> >>>
>> >>> Hi Ritchie,
>> >>>
>> >>> I'm not really sure why it is doing this. One possibility is boosting
>> >>> since
>> >>> that starts to kick in after a couple of thousand iterations. Can you
>> try
>> >>> re-running the experiment with maxBoost set to 1 and see if you get
>> the
>> >>> same behavior? You'll want to add a field called 'maxBoost' under
>> >>> 'spParams'.
>> >>>
>> >>> --Subutai
>> >>>
>> >>>
>> >>> On Wed, Apr 9, 2014 at 1:49 PM, Ritchie Lee <[email protected]>
>> >>> wrote:
>> >>>
>> >>> > Hi friends of NuPIC!
>> >>> >
>> >>> > I am a researcher currently working on an experimental project
>> applying
>> >>> > NuPIC to a large dataset for multivariate time series anomaly
>> >>> detection and
>> >>> > I have been running into various difficulties getting it to work.  I
>> >>> hope
>> >>> > this is a good forum to share these questions and see if anyone has
>> >>> > experienced similar problems or can offer any advice.  Thanks in
>> >>> advance!
>> >>> >
>> >>> > I was following Matt's sine wave screencast tutorial:
>> >>> > http://www.youtube.com/watch?v=KuFfm3ncEwI
>> >>> >
>> >>> > with the exception that I made ROWS=8000 instead of 3000 and I found
>> >>> the
>> >>> > following:
>> >>> >
>> >>> >
>> >>> > [image: Inline image 1]
>> >>> > Subplot1 is original
>> >>> > Subplot2 is predicted
>> >>> > Subplot3 is anomaly score
>> >>> >
>> >>> > I'm wondering why the anomaly score is acting oddly: After initial
>> >>> > learning, the anomaly score flatlines... but then activity spikes
>> back
>> >>> up
>> >>> > again.  I have verified in the data that the data is indeed
>> periodic,
>> >>> i.e.,
>> >>> > the same pattern is being shown to CLA over and over again.  Why is
>> >>> this
>> >>> > happening? Has anyone also tried this and run into this problem?
>> >>> >
>> >>> > Thanks,
>> >>> >
>> >>> > Ritchie
>> >>> >
>> >>> > Ritchie Lee
>> >>> > Research Engineer
>> >>> > Carnegie Mellon University-Silicon Valley
>> >>> > NASA Ames Research Center
>> >>> > Bldg 23, Rm 115
>> >>> > Moffett Field, CA 94035
>> >>> > (650) 335-2847
>>
>> >>> >
>> >>> > _______________________________________________
>> >>> > nupic mailing list
>> >>> > [email protected]
>> >>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >>> >
>> >>> >
>> >>> -------------- next part --------------
>> >>> An HTML attachment was scrubbed...
>> >>> URL: <
>> >>>
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140409/b86ec4a0/attachment.html
>> >>> >
>> >>> -------------- next part --------------
>> >>> A non-text attachment was scrubbed...
>> >>> Name: sintest1exp4.jpg
>> >>> Type: image/jpeg
>> >>> Size: 238666 bytes
>> >>> Desc: not available
>> >>> URL: <
>> >>>
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140409/b86ec4a0/attachment.jpg
>> >>> >
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> Subject: Digest Footer
>>
>> >>>
>> >>> _______________________________________________
>> >>> nupic mailing list
>> >>> [email protected]
>> >>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >>>
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> End of nupic Digest, Vol 12, Issue 20
>> >>> *************************************
>>
>> >>>
>> >>
>> >>
>> >
>> > _______________________________________________
>> > nupic mailing list
>> > [email protected]
>> > http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>> >
>> >
>>
>>
>> --
>> Pedro Tabacof
>> -------------- next part --------------
>> An HTML attachment was scrubbed...
>> URL: <
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140410/840169d3/attachment.html
>> >
>> -------------- next part --------------
>> A non-text attachment was scrubbed...
>> Name: sintest1exp5.jpg
>> Type: image/jpeg
>> Size: 224170 bytes
>> Desc: not available
>> URL: <
>> http://lists.numenta.org/pipermail/nupic_lists.numenta.org/attachments/20140410/840169d3/attachment.jpg
>> >
>>
>> ------------------------------
>>
>> Subject: Digest Footer
>>
>>
>> _______________________________________________
>> nupic mailing list
>> [email protected]
>> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>>
>>
>> ------------------------------
>>
>> End of nupic Digest, Vol 12, Issue 22
>> *************************************
>>
>
>
>
> --
> Regards,
>
> Matthew W. Roesener
> Tel: +808.542.9978
> Email: [email protected]
> LinkedIn: http://www.linkedin.com/in/matthewroesener
>
>
> _______________________________________________
> nupic mailing list
> [email protected]
> http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
>
>

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

Re: [nupic-discuss] Food Classification/ Ingredient Prediction

Reply via email to