Re: Swarming issue

Richard Crowder Tue, 03 Nov 2015 06:06:29 -0800

Hi Ryan. We're hoping that this may just be an environment configuration
issue. 'Path'ing to 2.6 first, before seeing 2.7 When you have a chance
it'd be great to find out if that is the case. We don't think it is the
'bash -c' removal in that commit change I mentioned. Many thanks, Richard.


On Mon, Nov 2, 2015 at 9:06 AM, Richard Crowder <[email protected]> wrote:

> Excellent Ryan! Thanks for digging into this.
>
> It looks like a change I made recently, tested in the usual way but on
> simpler Linux environments (incl. Travis), has affected this [1]. The
> change (to not use "bash -c" when making a subprocess call to launch
> workers) may have been working fine for you. But after the change now
> doesn't pick up the right python in a shell for your environment. An
> alternative method exists for that change, so we'll need to work out
> whether that is good to go. I'll discuss this later today with Scott.
>
> Best regards, Richard.
>
> 1
> https://github.com/rcrowder/nupic/commit/1ee717ee6ed27c65d21a6312089170f170e960d8
>
> On Mon, Nov 2, 2015 at 1:56 AM, Ryan J. McCall <[email protected]>
> wrote:
>
>> Aha, I found the issue. The child process (running HypersearchWorker.py)
>> was picking up python2.6, which is installed on the machine. There is a
>> hard-coded command line statement containing "python" in the
>> permutations_runner.py code and when I switched it to "python2.7" it works.
>> Here's the line I changed in the current code:
>>
>>
>> https://github.com/numenta/nupic/blob/master/src/nupic/swarming/permutations_runner.py#L676
>>
>> Is there is a standard way of telling a linux machine which python to
>> use? I suppose that would be the best solution. I had made an alias in my
>> bashrc to set "python" to version 2.7 but clearly that must not apply to
>> subprocesses. If you can't specify this then it seems we want the "python"
>> to be configurable, or detectable from the system.
>>
>> On Sun, Nov 1, 2015 at 2:30 PM, Richard Crowder <[email protected]> wrote:
>>
>>> "linux2" looks fine for the handlers, where they use
>>> startswith("linux"). So not likely to be that. Only other think I needed to
>>> do was to delete swarming files generated.
>>> So out of ideas of how I could get it to work on Windows, and you not :(
>>> Unless it's something with different bindings versions or some other
>>> Python package. Locally I have nupic 0.3.6.dev0 and nupic.bindings 0.2.2
>>> and a variety of other Python packages.
>>>
>>> Does "import os; print os.pathsep" print a colon? I'm imagining it
>>> does.. Will try a Ubuntu VM though.
>>>
>>>
>>> On Sun, Nov 1, 2015 at 10:08 PM, Ryan J. McCall <[email protected]>
>>> wrote:
>>>
>>>> Hi Richard,
>>>>
>>>> Thanks for the reply. I'm not sure what I might change regarding the
>>>> log handlers. (I see that there is a default logging conf file that I can
>>>> override in my NTA_CONF_PATH.) In my script I'm able to say:
>>>>
>>>> from nupic.support import initLogging
>>>> initLogging()
>>>>
>>>> and I see a difference in the messages logged to console.
>>>>
>>>> The swarm-generated files don't seem to be the problem.
>>>>
>>>> "import sys; print sys.platform.lower()" gives "linux2"
>>>>
>>>> Best,
>>>>
>>>> Ryan
>>>>
>>>> On Sun, Nov 1, 2015 at 3:19 AM, Richard Crowder <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Ryan,
>>>>>
>>>>> I've just updated my nupic.core and nupic forks with latest from
>>>>> Numenta master. And faced the exact same problem (but on Windows). I 
>>>>> needed
>>>>> to do two things. Updating sys and file log handlers to support win32
>>>>> (src\nupic\support\__init__.py) and to delete files generated during the
>>>>> run of the 'simple' swarming test (with one worker, i.e. no --maxWorkers 
>>>>> on
>>>>> command line). Those changes MAY only be related to the Windows porting,
>>>>> but a few things to try..
>>>>>
>>>>> See what the Python commands "import sys; print sys.platform.lower()"
>>>>> outputs.
>>>>> Cleaning up files generated by the swarming (for me those files where
>>>>> description.py,permutations.py, model_0/ directory, a .pkl and.csv file)
>>>>> Using --overwrite flag when swarming with the scripts\run_scripts.py
>>>>>
>>>>> I'd be interested to see the sys.platform output.
>>>>>
>>>>> Regards, Richard.
>>>>>
>>>>>
>>>>> On Sun, Nov 1, 2015 at 1:02 AM, Ryan J. McCall <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Hello NuPIC,
>>>>>>
>>>>>> I'm having an issue with swarming on a RHEL box. I've installed NuPIC
>>>>>> Version: 0.3.1. I have mysql running and have confirmed that db 
>>>>>> connections
>>>>>> can be made with the test_db.py script. The error I'm getting is similar 
>>>>>> to
>>>>>> some other threads (traceback below). The hypersearch finishes quickly,
>>>>>> evaluates 0 models and throws and exception because there's no result to
>>>>>> load. I would appreciate any suggestions. It looks like jobs are added to
>>>>>> the DB based on my debugging. My thought is to debug the 
>>>>>> HypersearchWorkers
>>>>>> next which run as separate processes -- have to figure out how to do 
>>>>>> that...
>>>>>>
>>>>>> Many thanks,
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>>
>>>>>> Successfully submitted new HyperSearch job, jobID=1020
>>>>>> Evaluated 0 models
>>>>>> HyperSearch finished!
>>>>>> Worker completion message: None
>>>>>>
>>>>>> Results from all experiments:
>>>>>> ----------------------------------------------------------------
>>>>>> Generating experiment files in directory: /tmp/tmp0y39RS...
>>>>>> Writing  313 lines...
>>>>>> Writing  114 lines...
>>>>>> done.
>>>>>> None
>>>>>> json.loads(jobInfo.results) raised an exception.  Here is some info
>>>>>> to help with debugging:
>>>>>> jobInfo:  _jobInfoNamedTuple(jobId=1020, client=u'GRP',
>>>>>> clientInfo=u'', clientKey=u'', cmdLine=u'$HYPERSEARCH',
>>>>>> params=u'{"hsVersion": "v2", "maxModels": null, "persistentJobGUID":
>>>>>> "1a3c7950-8032-11e5-8a23-a0d3c1f9d4f4", "useTerminators": false,
>>>>>> "description": {"includedFields": [{"fieldName": "time", "fieldType":
>>>>>> "datetime"}, {"maxValue": 50000, "fieldName": "volume", "fieldType": 
>>>>>> "int",
>>>>>> "minValue": 0}], "streamDef": {"info": "rp3_volume", "version": 1,
>>>>>> "streams": [{"info": "rp3_volume", "source":
>>>>>> "file:///home/rmccall/experiment/projects/rp3/rp3-training_data.csv",
>>>>>> "columns": ["*"]}]}, "inferenceType": "TemporalAnomaly", "inferenceArgs":
>>>>>> {"predictionSteps": [1], "predictedField": "volume"}, "iterationCount": 
>>>>>> -1,
>>>>>> "swarmSize": "small"}}',
>>>>>> jobHash='\x1a<\x81R\x802\x11\xe5\x8a#\xa0\xd3\xc1\xf9\xd4\xf4',
>>>>>> status=u'notStarted', completionReason=None, completionMsg=None,
>>>>>> workerCompletionReason=u'success', workerCompletionMsg=None, cancel=0,
>>>>>> startTime=None, endTime=None, results=None, engJobType=u'hypersearch',
>>>>>> minimumWorkers=1, maximumWorkers=8, priority=0, engAllocateNewWorkers=1,
>>>>>> engUntendedDeadWorkers=0, numFailedWorkers=0,
>>>>>> lastFailedWorkerErrorMsg=None, engCleaningStatus=u'notdone',
>>>>>> genBaseDescription=None, genPermutations=None,
>>>>>> engLastUpdateTime=datetime.datetime(2015, 11, 1, 0, 47, 18),
>>>>>> engCjmConnId=None, engWorkerState=None, engStatus=None,
>>>>>> engModelMilestones=None)
>>>>>> jobInfo.results:  None
>>>>>> EXCEPTION:  expected string or buffer
>>>>>> Traceback (most recent call last):
>>>>>>   File "/usr/local/lib/python2.7/pdb.py", line 1314, in main
>>>>>>     pdb._runscript(mainpyfile)
>>>>>>   File "/usr/local/lib/python2.7/pdb.py", line 1233, in _runscript
>>>>>>     self.run(statement)
>>>>>>   File "/usr/local/lib/python2.7/bdb.py", line 400, in run
>>>>>>     exec cmd in globals, locals
>>>>>>   File "<string>", line 1, in <module>
>>>>>>   File "htmAnomalyDetection.py", line 2, in <module>
>>>>>>     import argparse
>>>>>>   File "htmAnomalyDetection.py", line 314, in main
>>>>>>     runSwarming(args.nupicDataPath, args.projectName,
>>>>>> args.maxWorkers, args.overwrite)
>>>>>>   File "htmAnomalyDetection.py", line 164, in runSwarming
>>>>>>     "overwrite": overwrite})
>>>>>>   File
>>>>>> "/usr/local/lib/python2.7/site-packages/nupic/swarming/permutations_runner.py",
>>>>>> line 277, in runWithConfig
>>>>>>     return _runAction(runOptions)
>>>>>>   File
>>>>>> "/usr/local/lib/python2.7/site-packages/nupic/swarming/permutations_runner.py",
>>>>>> line 218, in _runAction
>>>>>>     returnValue = _runHyperSearch(runOptions)
>>>>>>   File
>>>>>> "/usr/local/lib/python2.7/site-packages/nupic/swarming/permutations_runner.py",
>>>>>> line 161, in _runHyperSearch
>>>>>>     metricsKeys=search.getDiscoveredMetricsKeys())
>>>>>>   File
>>>>>> "/usr/local/lib/python2.7/site-packages/nupic/swarming/permutations_runner.py",
>>>>>> line 826, in generateReport
>>>>>>     results = json.loads(jobInfo.results)
>>>>>>   File
>>>>>> "/usr/local/lib/python2.7/site-packages/nupic/swarming/object_json.py",
>>>>>> line 163, in loads
>>>>>>     json.loads(s, object_hook=objectDecoderHook, **kwargs))
>>>>>>   File "/usr/local/lib/python2.7/json/__init__.py", line 351, in loads
>>>>>>     return cls(encoding=encoding, **kw).decode(s)
>>>>>>   File "/usr/local/lib/python2.7/json/decoder.py", line 366, in decode
>>>>>>     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
>>>>>> TypeError: expected string or buffer
>>>>>>
>>>>>> --
>>>>>> Ryan J. McCall
>>>>>> ryanjmccall.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan J. McCall
>>>> ryanjmccall.com
>>>>
>>>
>>>
>>
>>
>> --
>> Ryan J. McCall
>> ryanjmccall.com
>>
>
>

Re: Swarming issue

Reply via email to