Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-22 Thread Alan Cudmore
Great news on the Pi 2 cache configuration!
I am looking forward to a patch to try.

Alan


On Sun, Jun 21, 2015 at 3:27 PM, Rohini Kulkarni krohini1...@gmail.com
wrote:

 :)
 There is very little code that had to be added.
 I need to clean the code and add conditional call for Pi 2. Then I would
 be ready to submit a patch.
  On 22 Jun 2015 00:52, Gedare Bloom ged...@gwu.edu wrote:

 On Sun, Jun 21, 2015 at 3:04 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  I missed mentioning the number of dhrystones in the previous mail.
 
  Originally it was 1 million.
  The new number of dhrystones I executed is 100 million.
 
 The next thing to do is to figure out what changes are contributing to
 the performance improvement, and then prepare patches. :) Great work

  On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  Hi all,
 
  I have managed to get a significant performance improvement with some
  changes in configurations.
 
  The measured time was for dhrystones reduced from 12 to too small to
 be
  measured 
 
  For dhrystones the time was 0.4.
 
  The number of dhrystones per second increased from approximately 8
 to
  250 :)
 
  Thanks!
 
  On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  Hi,
 
  I have added an SMP related post to my blog to define where exactly in
  the code I need to work. Some feedback to indicate if I am
 identifying the
  work area correctly would be very helpful!
 
  Thanks!
 
  On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 
  Hi all,
 
  I have updated my blog to reflect my understanding and attempts for
  cache performance issue.
 
  Lately I have been trying around memory attributes for the
  mm_config_table. One set of configurations for cacheable memory
 (inner and
  outer levels)ended up reducing performance further ( which I really
 thought
  would improve). So this table set up certainly controls performance.
 
  The results are not improving after turning on cache. So memory
 sections
  are perhaps not even getting cached.
  I get a feeling it has got to do with this mm_config_table.
 
  Updates from the github code and blog might help in further
 discussion.
 
  Link to github code:https://github.com/krohini1593/rtems/tree/rohini
 
  Link to Blog
 
  Thanks!
 
  On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore 
 alan.cudm...@gmail.com
  wrote:
 
  Hi,
  Some of the code examples may give you some clues. Like this one:
  https://github.com/mrvn/test/blob/master/smp.cc
 
  Or this:
  https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT
 
  If you still can't figure it out, you can always join the
  raspberrypi.org forums and ask on this thread:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  When it comes to the Pi 2 and SMP, you are our RTEMS expert :)
 
  Thanks,
  Alan
 
 
  On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni
  krohini1...@gmail.com wrote:
 
  Hi,
 
  This is regarding Pi 2 SMP support. After powering on, the
 secondary
  mailboxes read one of their four mailbox registers and wait for a
 non-zero
  content to be written. This content is to be the physical address
 of the
  location from where the cores are expected to start execution.
 
  I am stuck at figuring out this address. How should I go about
  understanding this?
 
  Thanks!
 
  On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:
 
  On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni
  krohini1...@gmail.com wrote:
   But, I can't say cache configurations have a role here.
  
   I'll push my code to my github project soon.
  
   P.S. The Pi2 board I possess seems to have broken down. It just
   isn't
   turning on. Unable to test further. Will order one immediately.
  
  Ouch. Make sure you put it in a safe space for development, clear
 of
  threats like moisture, static shock, and cats.
 
   On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com
   wrote:
  
   Hi,
  
   Alan, your suggestion has resulted in much improvement
  
   arm_control=0x1000
  
   This has simply worked! Looks like the other cores were taking
 up
   plenty
   of time.
   I was aware from references that the other cores run a WFI, but
   ya, did
   not get its impact.
   Time for each dhrystone has reduced to 7 from 13 and the no of
   dhrystones
   per second also increased.
  
   But this is a change only in the config.txt not actually in the
   boot code.
  
   Thanks
  
   Rohini
  
  
  
   On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore
   alan.cudm...@gmail.com
   wrote:
  
   The caches are being enabled on the RPI 1 BSP. The same code
 is
   being
   executed by the RPI 2 BSP, but obviously it’s not sufficient
 for
   the cache
   setup.
   I have been reading through this long thread, and it is very
   informative:
   https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
  
   I am starting to understand the setup that is required to
 enable
   caches
   on the RPI 2. For 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Rohini Kulkarni
Hello,

Are these the relevant functions from
~/rtems/cpukit/score/cpu/arm/rtems/score/cpu.h?
_CPU_SMP_Get_current_processor()
_CPU_SMP_Send_interrupt()
_CPU_SMP_Processor_event_broadcast()
_CPU_SMP_Processor_event_receive()

I am unable to understand how
~/rtems/cpukit/score/cpu/no_cpu/rtems/score/no_cpu.h can be used.

And am I right if I say these are in addition to what I have identified?

Thanks!


On Sun, Jun 21, 2015 at 12:50 PM, Sebastian Huber 
sebastian.hu...@embedded-brains.de wrote:

 Hello Rohini,

 the CPU functions relevant for SMP are documented in the no_cpu/cpu.h file.

 - Am 20. Jun 2015 um 22:02 schrieb Rohini Kulkarni 
 krohini1...@gmail.com:

 Hi,

 I have added an SMP related post to my blog to define where exactly in the
 code I need to work. Some feedback to indicate if I am identifying the work
 area correctly would be very helpful!




-- 
Rohini Kulkarni
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Gedare Bloom
On Sun, Jun 21, 2015 at 3:04 PM, Rohini Kulkarni krohini1...@gmail.com wrote:
 I missed mentioning the number of dhrystones in the previous mail.

 Originally it was 1 million.
 The new number of dhrystones I executed is 100 million.

The next thing to do is to figure out what changes are contributing to
the performance improvement, and then prepare patches. :) Great work

 On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi all,

 I have managed to get a significant performance improvement with some
 changes in configurations.

 The measured time was for dhrystones reduced from 12 to too small to be
 measured 

 For dhrystones the time was 0.4.

 The number of dhrystones per second increased from approximately 8 to
 250 :)

 Thanks!

 On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi,

 I have added an SMP related post to my blog to define where exactly in
 the code I need to work. Some feedback to indicate if I am identifying the
 work area correctly would be very helpful!

 Thanks!

 On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi all,

 I have updated my blog to reflect my understanding and attempts for
 cache performance issue.

 Lately I have been trying around memory attributes for the
 mm_config_table. One set of configurations for cacheable memory (inner and
 outer levels)ended up reducing performance further ( which I really thought
 would improve). So this table set up certainly controls performance.

 The results are not improving after turning on cache. So memory sections
 are perhaps not even getting cached.
 I get a feeling it has got to do with this mm_config_table.

 Updates from the github code and blog might help in further discussion.

 Link to github code:https://github.com/krohini1593/rtems/tree/rohini

 Link to Blog

 Thanks!

 On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 Hi,
 Some of the code examples may give you some clues. Like this one:
 https://github.com/mrvn/test/blob/master/smp.cc

 Or this:
 https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

 If you still can't figure it out, you can always join the
 raspberrypi.org forums and ask on this thread:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

 Thanks,
 Alan


 On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a 
 non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!

 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni
 krohini1...@gmail.com wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just
  isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com
  wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
  plenty
  of time.
  I was aware from references that the other cores run a WFI, but
  ya, did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
  dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the
  boot code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore
  alan.cudm...@gmail.com
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is
  being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for
  the cache
  setup.
  I have been reading through this long thread, and it is very
  informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
  caches
  on the RPI 2. For example this message near the bottom of page 3
  gives a
  good indication of the speedup available by configuring the MMU
  and caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia
  demo uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a
  much smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write
  allocate + outer
  write-back, 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Rohini Kulkarni
Hi all,

I have managed to get a significant performance improvement with some
changes in configurations.

The measured time was for dhrystones reduced from 12 to too small to be
measured 

For dhrystones the time was 0.4.

The number of dhrystones per second increased from approximately 8 to
250 :)

Thanks!

On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni krohini1...@gmail.com
wrote:

 Hi,

 I have added an SMP related post to my blog to define where exactly in the
 code I need to work. Some feedback to indicate if I am identifying the work
 area correctly would be very helpful!

 Thanks!
  On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi all,

 I have updated my blog to reflect my understanding and attempts for cache
 performance issue.

 Lately I have been trying around memory attributes for the
 mm_config_table. One set of configurations for cacheable memory (inner and
 outer levels)ended up reducing performance further ( which I really thought
 would improve). So this table set up certainly controls performance.

 The results are not improving after turning on cache. So memory sections
 are perhaps not even getting cached.
 I get a feeling it has got to do with this mm_config_table.

 Updates from the github code and blog might help in further discussion.

 Link to github code:https://github.com/krohini1593/rtems/tree/rohini

 Link to Blog http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html

 Thanks!

 On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 Hi,
 Some of the code examples may give you some clues. Like this one:
 https://github.com/mrvn/test/blob/master/smp.cc

 Or this:
 https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

 If you still can't figure it out, you can always join the
 raspberrypi.org forums and ask on this thread:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

 Thanks,
 Alan


 On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!
 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
 plenty
  of time.
  I was aware from references that the other cores run a WFI, but ya,
 did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
 dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the
 boot code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore 
 alan.cudm...@gmail.com
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is
 being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for
 the cache
  setup.
  I have been reading through this long thread, and it is very
 informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
 caches
  on the RPI 2. For example this message near the bottom of page 3
 gives a
  good indication of the speedup available by configuring the MMU
 and caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia
 demo uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a
 much smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write
 allocate + outer
  write-back, write-allocate it takes ~8 seconds, of 32 fps.
 
  PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
 cache
  effect.
  -
  End of quote
 
  The person who posted the above comment (mrvn) posted the code
 here:
  https://github.com/mrvn/test/blob/master/mmu.cc
 
 
  Also, it seems that when the Pi 2 starts up, cores 1-3 are put in
 a wait
  loop always accessing the bus. By putting this option 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Rohini Kulkarni
:)
There is very little code that had to be added.
I need to clean the code and add conditional call for Pi 2. Then I would be
ready to submit a patch.
 On 22 Jun 2015 00:52, Gedare Bloom ged...@gwu.edu wrote:

 On Sun, Jun 21, 2015 at 3:04 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  I missed mentioning the number of dhrystones in the previous mail.
 
  Originally it was 1 million.
  The new number of dhrystones I executed is 100 million.
 
 The next thing to do is to figure out what changes are contributing to
 the performance improvement, and then prepare patches. :) Great work

  On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni krohini1...@gmail.com
 
  wrote:
 
  Hi all,
 
  I have managed to get a significant performance improvement with some
  changes in configurations.
 
  The measured time was for dhrystones reduced from 12 to too small to be
  measured 
 
  For dhrystones the time was 0.4.
 
  The number of dhrystones per second increased from approximately 8
 to
  250 :)
 
  Thanks!
 
  On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni krohini1...@gmail.com
 
  wrote:
 
  Hi,
 
  I have added an SMP related post to my blog to define where exactly in
  the code I need to work. Some feedback to indicate if I am identifying
 the
  work area correctly would be very helpful!
 
  Thanks!
 
  On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com wrote:
 
  Hi all,
 
  I have updated my blog to reflect my understanding and attempts for
  cache performance issue.
 
  Lately I have been trying around memory attributes for the
  mm_config_table. One set of configurations for cacheable memory
 (inner and
  outer levels)ended up reducing performance further ( which I really
 thought
  would improve). So this table set up certainly controls performance.
 
  The results are not improving after turning on cache. So memory
 sections
  are perhaps not even getting cached.
  I get a feeling it has got to do with this mm_config_table.
 
  Updates from the github code and blog might help in further
 discussion.
 
  Link to github code:https://github.com/krohini1593/rtems/tree/rohini
 
  Link to Blog
 
  Thanks!
 
  On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
 
  wrote:
 
  Hi,
  Some of the code examples may give you some clues. Like this one:
  https://github.com/mrvn/test/blob/master/smp.cc
 
  Or this:
  https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT
 
  If you still can't figure it out, you can always join the
  raspberrypi.org forums and ask on this thread:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  When it comes to the Pi 2 and SMP, you are our RTEMS expert :)
 
  Thanks,
  Alan
 
 
  On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni
  krohini1...@gmail.com wrote:
 
  Hi,
 
  This is regarding Pi 2 SMP support. After powering on, the secondary
  mailboxes read one of their four mailbox registers and wait for a
 non-zero
  content to be written. This content is to be the physical address
 of the
  location from where the cores are expected to start execution.
 
  I am stuck at figuring out this address. How should I go about
  understanding this?
 
  Thanks!
 
  On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:
 
  On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni
  krohini1...@gmail.com wrote:
   But, I can't say cache configurations have a role here.
  
   I'll push my code to my github project soon.
  
   P.S. The Pi2 board I possess seems to have broken down. It just
   isn't
   turning on. Unable to test further. Will order one immediately.
  
  Ouch. Make sure you put it in a safe space for development, clear
 of
  threats like moisture, static shock, and cats.
 
   On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com
   wrote:
  
   Hi,
  
   Alan, your suggestion has resulted in much improvement
  
   arm_control=0x1000
  
   This has simply worked! Looks like the other cores were taking
 up
   plenty
   of time.
   I was aware from references that the other cores run a WFI, but
   ya, did
   not get its impact.
   Time for each dhrystone has reduced to 7 from 13 and the no of
   dhrystones
   per second also increased.
  
   But this is a change only in the config.txt not actually in the
   boot code.
  
   Thanks
  
   Rohini
  
  
  
   On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore
   alan.cudm...@gmail.com
   wrote:
  
   The caches are being enabled on the RPI 1 BSP. The same code is
   being
   executed by the RPI 2 BSP, but obviously it’s not sufficient
 for
   the cache
   setup.
   I have been reading through this long thread, and it is very
   informative:
   https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
  
   I am starting to understand the setup that is required to
 enable
   caches
   on the RPI 2. For example this message near the bottom of page
 3
   gives a
   good indication of the speedup available by configuring the MMU
   and caches
   correctly:
   Quote from above thread

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Rohini Kulkarni
I missed mentioning the number of dhrystones in the previous mail.

Originally it was 1 million.
The new number of dhrystones I executed is 100 million.

On Mon, Jun 22, 2015 at 12:29 AM, Rohini Kulkarni krohini1...@gmail.com
wrote:

 Hi all,

 I have managed to get a significant performance improvement with some
 changes in configurations.

 The measured time was for dhrystones reduced from 12 to too small to be
 measured 

 For dhrystones the time was 0.4.

 The number of dhrystones per second increased from approximately 8 to
 250 :)

 Thanks!

 On Sun, Jun 21, 2015 at 1:32 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi,

 I have added an SMP related post to my blog to define where exactly in
 the code I need to work. Some feedback to indicate if I am identifying the
 work area correctly would be very helpful!

 Thanks!
  On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi all,

 I have updated my blog to reflect my understanding and attempts for
 cache performance issue.

 Lately I have been trying around memory attributes for the
 mm_config_table. One set of configurations for cacheable memory (inner and
 outer levels)ended up reducing performance further ( which I really thought
 would improve). So this table set up certainly controls performance.

 The results are not improving after turning on cache. So memory sections
 are perhaps not even getting cached.
 I get a feeling it has got to do with this mm_config_table.

 Updates from the github code and blog might help in further discussion.

 Link to github code:https://github.com/krohini1593/rtems/tree/rohini

 Link to Blog http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html

 Thanks!

 On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 Hi,
 Some of the code examples may give you some clues. Like this one:
 https://github.com/mrvn/test/blob/master/smp.cc

 Or this:
 https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

 If you still can't figure it out, you can always join the
 raspberrypi.org forums and ask on this thread:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

 Thanks,
 Alan


 On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni krohini1...@gmail.com
  wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!
 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni 
 krohini1...@gmail.com wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just
 isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
 plenty
  of time.
  I was aware from references that the other cores run a WFI, but
 ya, did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
 dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the
 boot code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore 
 alan.cudm...@gmail.com
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is
 being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for
 the cache
  setup.
  I have been reading through this long thread, and it is very
 informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
 caches
  on the RPI 2. For example this message near the bottom of page 3
 gives a
  good indication of the speedup available by configuring the MMU
 and caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia
 demo uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a
 much smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write
 allocate + outer
  write-back, write-allocate it takes ~8 seconds, of 32 fps.
 
  PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
 cache
  effect.
  -
  End of quote
 
  The 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-21 Thread Sebastian Huber
Hello Rohini, 

the CPU functions relevant for SMP are documented in the no_cpu/cpu.h file. 

- Am 20. Jun 2015 um 22:02 schrieb Rohini Kulkarni krohini1...@gmail.com: 

 Hi,

 I have added an SMP related post to my blog to define where exactly in the 
 code
 I need to work. Some feedback to indicate if I am identifying the work area
 correctly would be very helpful!
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-20 Thread Rohini Kulkarni
Hi,

I have added an SMP related post to my blog to define where exactly in the
code I need to work. Some feedback to indicate if I am identifying the work
area correctly would be very helpful!

Thanks!
 On 18 Jun 2015 03:37, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi all,

 I have updated my blog to reflect my understanding and attempts for cache
 performance issue.

 Lately I have been trying around memory attributes for the
 mm_config_table. One set of configurations for cacheable memory (inner and
 outer levels)ended up reducing performance further ( which I really thought
 would improve). So this table set up certainly controls performance.

 The results are not improving after turning on cache. So memory sections
 are perhaps not even getting cached.
 I get a feeling it has got to do with this mm_config_table.

 Updates from the github code and blog might help in further discussion.

 Link to github code:https://github.com/krohini1593/rtems/tree/rohini

 Link to Blog http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html

 Thanks!

 On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 Hi,
 Some of the code examples may give you some clues. Like this one:
 https://github.com/mrvn/test/blob/master/smp.cc

 Or this:
 https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

 If you still can't figure it out, you can always join the raspberrypi.org
 forums and ask on this thread:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

 Thanks,
 Alan


 On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!
 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
 plenty
  of time.
  I was aware from references that the other cores run a WFI, but ya,
 did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
 dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the boot
 code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com
 
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is
 being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for
 the cache
  setup.
  I have been reading through this long thread, and it is very
 informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
 caches
  on the RPI 2. For example this message near the bottom of page 3
 gives a
  good indication of the speedup available by configuring the MMU and
 caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia demo
 uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a
 much smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write allocate
 + outer
  write-back, write-allocate it takes ~8 seconds, of 32 fps.
 
  PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
 cache
  effect.
  -
  End of quote
 
  The person who posted the above comment (mrvn) posted the code here:
  https://github.com/mrvn/test/blob/master/mmu.cc
 
 
  Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a
 wait
  loop always accessing the bus. By putting this option in the
 config.txt file
  you can put the other cores to sleep, speeding up the code on core
 1.
   arm_control=0x1000
  It would be worth trying that option to see if the benchmark speeds
 up.
 
 
  Alan
 
  On Jun 2, 2015, at 8:05 AM, Hesham ALMatary 
 heshamelmat...@gmail.com
  wrote:
 
  On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  From what I saw, they have 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-17 Thread Rohini Kulkarni
Hi all,

I have updated my blog to reflect my understanding and attempts for cache
performance issue.

Lately I have been trying around memory attributes for the mm_config_table.
One set of configurations for cacheable memory (inner and outer
levels)ended up reducing performance further ( which I really thought would
improve). So this table set up certainly controls performance.

The results are not improving after turning on cache. So memory sections
are perhaps not even getting cached.
I get a feeling it has got to do with this mm_config_table.

Updates from the github code and blog might help in further discussion.

Link to github code:https://github.com/krohini1593/rtems/tree/rohini

Link to Blog http://rohiniwithrpi2.blogspot.in/p/blog-page_3.html

Thanks!

On Mon, Jun 15, 2015 at 8:29 PM, Alan Cudmore alan.cudm...@gmail.com
wrote:

 Hi,
 Some of the code examples may give you some clues. Like this one:
 https://github.com/mrvn/test/blob/master/smp.cc

 Or this:
 https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

 If you still can't figure it out, you can always join the raspberrypi.org
 forums and ask on this thread:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

 Thanks,
 Alan


 On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!
 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
 plenty
  of time.
  I was aware from references that the other cores run a WFI, but ya,
 did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
 dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the boot
 code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for the
 cache
  setup.
  I have been reading through this long thread, and it is very
 informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
 caches
  on the RPI 2. For example this message near the bottom of page 3
 gives a
  good indication of the speedup available by configuring the MMU and
 caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia demo
 uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a much
 smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write allocate
 + outer
  write-back, write-allocate it takes ~8 seconds, of 32 fps.
 
  PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2
 cache
  effect.
  -
  End of quote
 
  The person who posted the above comment (mrvn) posted the code here:
  https://github.com/mrvn/test/blob/master/mmu.cc
 
 
  Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a
 wait
  loop always accessing the bus. By putting this option in the
 config.txt file
  you can put the other cores to sleep, speeding up the code on core 1.
   arm_control=0x1000
  It would be worth trying that option to see if the benchmark speeds
 up.
 
 
  Alan
 
  On Jun 2, 2015, at 8:05 AM, Hesham ALMatary 
 heshamelmat...@gmail.com
  wrote:
 
  On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  From what I saw, they have to be enabled separately. Cache/mmu are
  disabled
  upon reset.
 
  For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
  however I don't know about Pi2 and where its code is.
 
  [1]
 
 https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
 
  On 2 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-15 Thread Alan Cudmore
Hi,
Some of the code examples may give you some clues. Like this one:
https://github.com/mrvn/test/blob/master/smp.cc

Or this:
https://github.com/PeterLemon/RaspberryPi/tree/master/SMP/SMPINIT

If you still can't figure it out, you can always join the raspberrypi.org
forums and ask on this thread:
https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

When it comes to the Pi 2 and SMP, you are our RTEMS expert :)

Thanks,
Alan


On Sat, Jun 13, 2015 at 2:29 PM, Rohini Kulkarni krohini1...@gmail.com
wrote:

 Hi,

 This is regarding Pi 2 SMP support. After powering on, the secondary
 mailboxes read one of their four mailbox registers and wait for a non-zero
 content to be written. This content is to be the physical address of the
 location from where the cores are expected to start execution.

 I am stuck at figuring out this address. How should I go about
 understanding this?

 Thanks!
 On 3 Jun 2015 19:44, Gedare Bloom ged...@gwu.edu wrote:

 On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni krohini1...@gmail.com
 wrote:
  But, I can't say cache configurations have a role here.
 
  I'll push my code to my github project soon.
 
  P.S. The Pi2 board I possess seems to have broken down. It just isn't
  turning on. Unable to test further. Will order one immediately.
 
 Ouch. Make sure you put it in a safe space for development, clear of
 threats like moisture, static shock, and cats.

  On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com wrote:
 
  Hi,
 
  Alan, your suggestion has resulted in much improvement
 
  arm_control=0x1000
 
  This has simply worked! Looks like the other cores were taking up
 plenty
  of time.
  I was aware from references that the other cores run a WFI, but ya, did
  not get its impact.
  Time for each dhrystone has reduced to 7 from 13 and the no of
 dhrystones
  per second also increased.
 
  But this is a change only in the config.txt not actually in the boot
 code.
 
  Thanks
 
  Rohini
 
 
 
  On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com
  wrote:
 
  The caches are being enabled on the RPI 1 BSP. The same code is being
  executed by the RPI 2 BSP, but obviously it’s not sufficient for the
 cache
  setup.
  I have been reading through this long thread, and it is very
 informative:
  https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904
 
  I am starting to understand the setup that is required to enable
 caches
  on the RPI 2. For example this message near the bottom of page 3
 gives a
  good indication of the speedup available by configuring the MMU and
 caches
  correctly:
  Quote from above thread
  --
  Enabling I/D caches and branch prediction, just like the julia demo
 uses,
  it takes ~12 seconds, or ~21 fps. It's just one core but also a much
 smaller
  loop than the julia demo has.
 
  Enabling the MMU and mapping memory inner/outer write-back, write
  allocate and the framebuffer inner write-through, no write allocate +
 outer
  write-back, write-allocate it takes ~8 seconds, of 32 fps.
 
  PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 cache
  effect.
  -
  End of quote
 
  The person who posted the above comment (mrvn) posted the code here:
  https://github.com/mrvn/test/blob/master/mmu.cc
 
 
  Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a
 wait
  loop always accessing the bus. By putting this option in the
 config.txt file
  you can put the other cores to sleep, speeding up the code on core 1.
   arm_control=0x1000
  It would be worth trying that option to see if the benchmark speeds
 up.
 
 
  Alan
 
  On Jun 2, 2015, at 8:05 AM, Hesham ALMatary heshamelmat...@gmail.com
 
  wrote:
 
  On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  From what I saw, they have to be enabled separately. Cache/mmu are
  disabled
  upon reset.
 
  For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
  however I don't know about Pi2 and where its code is.
 
  [1]
 
 https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
 
  On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com
 wrote:
 
 
  Hi,
 
  Aren't the MMU/Caches enabled by default for RPi [1]?
 
  [1]
 
 
 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c
 
  On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
  joel.sherr...@oarcorp.com wrote:
 
 
 
  On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni 
 krohini1...@gmail.com
  wrote:
 
  Dr. Joel,
 
  So we can't say something solely on the basis of this result?
 
 
  I don't think so. If Linux performs the same, then what you did is as
  good as it gets.
 
  However, if Linux is faster then some setting still isn't right.
 
  You need a reference measurement to have any confidence. It is
 possible
  you did something but didn't actually turn the cache (or all the
 cache)
  on.
 
  On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-03 Thread Rohini Kulkarni
But, I can't say cache configurations have a role here.

I'll push my code to my github project soon.

P.S. The Pi2 board I possess seems to have broken down. It just isn't
turning on. Unable to test further. Will order one immediately.
On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi,

 Alan, your suggestion has resulted in much improvement

 arm_control=0x1000

 This has simply worked! Looks like the other cores were taking up plenty
 of time.
 I was aware from references that the other cores run a WFI, but ya, did
 not get its impact.
 Time for each dhrystone has reduced to 7 from 13 and the no of dhrystones
 per second also increased.

 But this is a change only in the config.txt not actually in the boot code.


 Thanks

 Rohini



 On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 The caches are being enabled on the RPI 1 BSP. The same code is being
 executed by the RPI 2 BSP, but obviously it’s not sufficient for the cache
 setup.
 I have been reading through this long thread, and it is very informative:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 I am starting to understand the setup that is required to enable caches
 on the RPI 2. For example this message near the bottom of page 3 gives a
 good indication of the speedup available by configuring the MMU and caches
 correctly:
 Quote from above thread
 --
 Enabling I/D caches and branch prediction, just like the julia demo uses,
 it takes ~12 seconds, or ~21 fps. It's just one core but also a much
 smaller loop than the julia demo has.

 Enabling the MMU and mapping memory inner/outer write-back, write
 allocate and the framebuffer inner write-through, no write allocate + outer
 write-back, write-allocate it takes ~8 seconds, of 32 fps.

 PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 cache
 effect.
 -
 End of quote

 The person who posted the above comment (mrvn) posted the code here:
 https://github.com/mrvn/test/blob/master/mmu.cc


 Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a wait
 loop always accessing the bus. By putting this option in the config.txt
 file you can put the other cores to sleep, speeding up the code on core 1.
  arm_control=0x1000
 It would be worth trying that option to see if the benchmark speeds up.


 Alan

 On Jun 2, 2015, at 8:05 AM, Hesham ALMatary heshamelmat...@gmail.com
 wrote:

 On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 From what I saw, they have to be enabled separately. Cache/mmu are
 disabled
 upon reset.

 For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
 however I don't know about Pi2 and where its code is.

 [1]
 https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi

 On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com wrote:


 Hi,

 Aren't the MMU/Caches enabled by default for RPi [1]?

 [1]

 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

 On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Dr. Joel,

 So we can't say something solely on the basis of this result?


 I don't think so. If Linux performs the same, then what you did is as
 good as it gets.

 However, if Linux is faster then some setting still isn't right.

 You need a reference measurement to have any confidence. It is possible
 you did something but didn't actually turn the cache (or all the cache)
 on.

 On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:

 I have not run it under linux on pi2 yet. Will have to run and check
 the result.

 On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 HI,

 I tried running the dhrystone benchmark with some changes for

 cache/mmu

 set up.

 However, the output shows a reduction in performance.
 The time to run through the dhrystone has increased from 12 to 13 and
 dhrystones run per second decreased.

 According to this result, things were better with caches disabled.


 I have been working on this since two days and could not figure out an
 improvement. Any pointers?


 How did it do under Linux on the Pi2?


 Thanks.



 On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:

 Hi All,

 I have to implement the cache coherency support for Cortex A7. But for
 A7 MPCore, unlike for A9, I am not able to find any register
 description for the Snoop Control Unit from the TRM.

 I need help here on how to proceed.

 Additionally for A9 there is a single bit for A9 in the Auxiliary
 Control Register which enables cache broadcast operations. The

 register

 format is different for A7 and again I am unable to find how to

 achieve

 the same for A7.

 Thanks!





 On Tue, May 5, 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-03 Thread Gedare Bloom
On Wed, Jun 3, 2015 at 2:39 AM, Rohini Kulkarni krohini1...@gmail.com wrote:
 But, I can't say cache configurations have a role here.

 I'll push my code to my github project soon.

 P.S. The Pi2 board I possess seems to have broken down. It just isn't
 turning on. Unable to test further. Will order one immediately.

Ouch. Make sure you put it in a safe space for development, clear of
threats like moisture, static shock, and cats.

 On 3 Jun 2015 09:03, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi,

 Alan, your suggestion has resulted in much improvement

 arm_control=0x1000

 This has simply worked! Looks like the other cores were taking up plenty
 of time.
 I was aware from references that the other cores run a WFI, but ya, did
 not get its impact.
 Time for each dhrystone has reduced to 7 from 13 and the no of dhrystones
 per second also increased.

 But this is a change only in the config.txt not actually in the boot code.

 Thanks

 Rohini



 On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com
 wrote:

 The caches are being enabled on the RPI 1 BSP. The same code is being
 executed by the RPI 2 BSP, but obviously it’s not sufficient for the cache
 setup.
 I have been reading through this long thread, and it is very informative:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 I am starting to understand the setup that is required to enable caches
 on the RPI 2. For example this message near the bottom of page 3 gives a
 good indication of the speedup available by configuring the MMU and caches
 correctly:
 Quote from above thread
 --
 Enabling I/D caches and branch prediction, just like the julia demo uses,
 it takes ~12 seconds, or ~21 fps. It's just one core but also a much smaller
 loop than the julia demo has.

 Enabling the MMU and mapping memory inner/outer write-back, write
 allocate and the framebuffer inner write-through, no write allocate + outer
 write-back, write-allocate it takes ~8 seconds, of 32 fps.

 PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 cache
 effect.
 -
 End of quote

 The person who posted the above comment (mrvn) posted the code here:
 https://github.com/mrvn/test/blob/master/mmu.cc


 Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a wait
 loop always accessing the bus. By putting this option in the config.txt file
 you can put the other cores to sleep, speeding up the code on core 1.
  arm_control=0x1000
 It would be worth trying that option to see if the benchmark speeds up.


 Alan

 On Jun 2, 2015, at 8:05 AM, Hesham ALMatary heshamelmat...@gmail.com
 wrote:

 On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 From what I saw, they have to be enabled separately. Cache/mmu are
 disabled
 upon reset.

 For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
 however I don't know about Pi2 and where its code is.

 [1]
 https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi

 On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com wrote:


 Hi,

 Aren't the MMU/Caches enabled by default for RPi [1]?

 [1]

 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

 On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Dr. Joel,

 So we can't say something solely on the basis of this result?


 I don't think so. If Linux performs the same, then what you did is as
 good as it gets.

 However, if Linux is faster then some setting still isn't right.

 You need a reference measurement to have any confidence. It is possible
 you did something but didn't actually turn the cache (or all the cache)
 on.

 On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:

 I have not run it under linux on pi2 yet. Will have to run and check
 the result.

 On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 HI,

 I tried running the dhrystone benchmark with some changes for

 cache/mmu

 set up.

 However, the output shows a reduction in performance.
 The time to run through the dhrystone has increased from 12 to 13 and
 dhrystones run per second decreased.

 According to this result, things were better with caches disabled.


 I have been working on this since two days and could not figure out an
 improvement. Any pointers?


 How did it do under Linux on the Pi2?


 Thanks.



 On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:

 Hi All,

 I have to implement the cache coherency support for Cortex A7. But for
 A7 MPCore, unlike for A9, I am not able to find any register
 description for the Snoop Control Unit from the TRM.

 I need help here on how to proceed.

 Additionally for A9 there is a single bit for A9 in the Auxiliary
 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-03 Thread Gedare Bloom
On Tue, Jun 2, 2015 at 9:42 PM, Alan Cudmore alan.cudm...@gmail.com wrote:
 The caches are being enabled on the RPI 1 BSP. The same code is being
 executed by the RPI 2 BSP, but obviously it’s not sufficient for the cache
 setup.
 I have been reading through this long thread, and it is very informative:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 I am starting to understand the setup that is required to enable caches on
 the RPI 2. For example this message near the bottom of page 3 gives a good
 indication of the speedup available by configuring the MMU and caches
 correctly:
 Quote from above thread
 --
 Enabling I/D caches and branch prediction, just like the julia demo uses, it
 takes ~12 seconds, or ~21 fps. It's just one core but also a much smaller
 loop than the julia demo has.

 Enabling the MMU and mapping memory inner/outer write-back, write allocate
 and the framebuffer inner write-through, no write allocate + outer
 write-back, write-allocate it takes ~8 seconds, of 32 fps.

 PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 cache
 effect.
 -
 End of quote

 The person who posted the above comment (mrvn) posted the code here:
 https://github.com/mrvn/test/blob/master/mmu.cc

Make sure not to copy the code from there though, as it is GPL3. You
may refer to it for some learning though.
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Rohini Kulkarni
Hi,

Alan, your suggestion has resulted in much improvement

arm_control=0x1000

This has simply worked! Looks like the other cores were taking up plenty of
time.
I was aware from references that the other cores run a WFI, but ya, did not
get its impact.
Time for each dhrystone has reduced to 7 from 13 and the no of dhrystones
per second also increased.

But this is a change only in the config.txt not actually in the boot code.

Thanks

Rohini



On Wed, Jun 3, 2015 at 7:12 AM, Alan Cudmore alan.cudm...@gmail.com wrote:

 The caches are being enabled on the RPI 1 BSP. The same code is being
 executed by the RPI 2 BSP, but obviously it’s not sufficient for the cache
 setup.
 I have been reading through this long thread, and it is very informative:
 https://www.raspberrypi.org/forums/viewtopic.php?f=72t=98904

 I am starting to understand the setup that is required to enable caches on
 the RPI 2. For example this message near the bottom of page 3 gives a good
 indication of the speedup available by configuring the MMU and caches
 correctly:
 Quote from above thread
 --
 Enabling I/D caches and branch prediction, just like the julia demo uses,
 it takes ~12 seconds, or ~21 fps. It's just one core but also a much
 smaller loop than the julia demo has.

 Enabling the MMU and mapping memory inner/outer write-back, write allocate
 and the framebuffer inner write-through, no write allocate + outer
 write-back, write-allocate it takes ~8 seconds, of 32 fps.

 PS: 640x480x32 with MMU gets me ~256 fps. Must have a greater L2 cache
 effect.
 -
 End of quote

 The person who posted the above comment (mrvn) posted the code here:
 https://github.com/mrvn/test/blob/master/mmu.cc


 Also, it seems that when the Pi 2 starts up, cores 1-3 are put in a wait
 loop always accessing the bus. By putting this option in the config.txt
 file you can put the other cores to sleep, speeding up the code on core 1.
  arm_control=0x1000
 It would be worth trying that option to see if the benchmark speeds up.


 Alan

 On Jun 2, 2015, at 8:05 AM, Hesham ALMatary heshamelmat...@gmail.com
 wrote:

 On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 From what I saw, they have to be enabled separately. Cache/mmu are disabled
 upon reset.

 For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
 however I don't know about Pi2 and where its code is.

 [1]
 https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi

 On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com wrote:


 Hi,

 Aren't the MMU/Caches enabled by default for RPi [1]?

 [1]

 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

 On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 Dr. Joel,

 So we can't say something solely on the basis of this result?


 I don't think so. If Linux performs the same, then what you did is as
 good as it gets.

 However, if Linux is faster then some setting still isn't right.

 You need a reference measurement to have any confidence. It is possible
 you did something but didn't actually turn the cache (or all the cache) on.

 On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:

 I have not run it under linux on pi2 yet. Will have to run and check
 the result.

 On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com wrote:



 On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:

 HI,

 I tried running the dhrystone benchmark with some changes for

 cache/mmu

 set up.

 However, the output shows a reduction in performance.
 The time to run through the dhrystone has increased from 12 to 13 and
 dhrystones run per second decreased.

 According to this result, things were better with caches disabled.


 I have been working on this since two days and could not figure out an
 improvement. Any pointers?


 How did it do under Linux on the Pi2?


 Thanks.



 On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:

 Hi All,

 I have to implement the cache coherency support for Cortex A7. But for
 A7 MPCore, unlike for A9, I am not able to find any register
 description for the Snoop Control Unit from the TRM.

 I need help here on how to proceed.

 Additionally for A9 there is a single bit for A9 in the Auxiliary
 Control Register which enables cache broadcast operations. The

 register

 format is different for A7 and again I am unable to find how to

 achieve

 the same for A7.

 Thanks!





 On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:



 On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

 Hi,

 I am working with the code for bsp hooks. I am referring to existing
 ARM multicore bsp codes, zync mainly.

 1. There are existing hooks for the raspberry pi. Where should the

 code

 for the  Pi2 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Rohini Kulkarni
From what I saw, they have to be enabled separately. Cache/mmu are disabled
upon reset.
On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com wrote:

 Hi,

 Aren't the MMU/Caches enabled by default for RPi [1]?

 [1]
 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

 On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 
 
  On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 Dr. Joel,
 
 So we can't say something solely on the basis of this result?
 
  I don't think so. If Linux performs the same, then what you did is as
 good as it gets.
 
  However, if Linux is faster then some setting still isn't right.
 
  You need a reference measurement to have any confidence. It is possible
 you did something but didn't actually turn the cache (or all the cache) on.
 
 On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:
 
 I have not run it under linux on pi2 yet. Will have to run and check
 the result.
 
 On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com wrote:
 
 
 
 On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 HI,
 
 I tried running the dhrystone benchmark with some changes for
 cache/mmu
 set up.
 
 However, the output shows a reduction in performance.
 The time to run through the dhrystone has increased from 12 to 13 and
 dhrystones run per second decreased.
 
 According to this result, things were better with caches disabled.
 
 
 I have been working on this since two days and could not figure out an
 improvement. Any pointers?
 
 How did it do under Linux on the Pi2?
 
 
 Thanks.
 
 
 
 On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:
 
 Hi All,
 
 I have to implement the cache coherency support for Cortex A7. But for
 A7 MPCore, unlike for A9, I am not able to find any register
 description for the Snoop Control Unit from the TRM.
 
 I need help here on how to proceed.
 
 Additionally for A9 there is a single bit for A9 in the Auxiliary
 Control Register which enables cache broadcast operations. The
 register
 format is different for A7 and again I am unable to find how to
 achieve
 the same for A7.
 
 Thanks!
 
 
 
 
 
 On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 
 
 
 On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:
 
 Hi,
 
 I am working with the code for bsp hooks. I am referring to existing
 ARM multicore bsp codes, zync mainly.
 
 1. There are existing hooks for the raspberry pi. Where should the
 code
 for the  Pi2 hooks be added?
 
 The Pi and Pi2 are remarkably similar so Pi2 should be placed inside
 the Pi BSP directory.
 There is already a Pi2 variant of that code built. But we know
 specific
 places where there
 are variances. Depending on the scope of what is different, it can be
 as simple as
 a cpp conditional in a .h to select a value or two implementations of
 a
 single method
 and the Makefile.am picking the right file to build based on the board
 variant.
 
 The big question to always ask is: Is this specific to the Pi2 and
 incompatible with the Pi?
 
 Since the Pi BSP is still missing capabilities, it is likely code
 common to both will
 be added this summer. For example, did the mailbox interface change? I
 don't know
 but would guess that it didn't.  Each new capability added needs that
 added.
 
 And any differences need to be analyzed to pick the least intrusive
 way
 to provide
 alternate implementations. Or enable special code like the Pi2 SMP
 support which
 is dependent on --enable-smp and being a Pi2.
 
 2. Am I right in understanding that I will have to implement A7
 specific functions as have been for A9? I am referring specifically to
 the arm-a9mpcore-start.h
 
 Yes.
 
 If the code is very similar between the a7 and a9, then a discussion
 on devel@ should occur to decide the best way to minimize duplication.
 
 If you end up with a7 specific code, you should follow the location
 and
 
 naming patterns already established. That places it in
 libbsp/arm/shared/...
 so it can be used by any BSP with the right SMP core.
 
 
 I am referring to existing codes to locate and get hold of what needs
 to be done in the hooks. However, being new to such implementations, I
 am taking longer to understand the details. Any suggestions that might
 help here are welcome
 
 The answer will depend on the factors listed above. When code can
 be shared, we want to share it across as many BSPs as makes sense.
 When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
 you want to find the way to account for the variation in the least
 intrusive code way possible.
 
 Thanks!
 
 On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com wrote:
 
 
 Hi,
 
 Excited to be a part of  this edition of GSoC! Thanks to everybody for
 helping me get here and congratulations to all the participating
 students!
 
 So, now getting to work, firstly I wish to know, specifically from 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Joel Sherrill


On June 2, 2015 7:29:52 AM EDT, Hesham ALMatary heshamelmat...@gmail.com 
wrote:
Hi,

Aren't the MMU/Caches enabled by default for RPi [1]?

Yes but I recall that the setup is different on the Pi2 and Alan disabled the 
code to to work at all.

[1]
https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
joel.sherr...@oarcorp.com wrote:


 On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni
krohini1...@gmail.com wrote:
Dr. Joel,

So we can't say something solely on the basis of this result?

 I don't think so. If Linux performs the same, then what you did is as
good as it gets.

 However, if Linux is faster then some setting still isn't right.

 You need a reference measurement to have any confidence. It is
possible you did something but didn't actually turn the cache (or all
the cache) on.

On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:

I have not run it under linux on pi2 yet. Will have to run and check
the result.

On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com
wrote:



On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni
krohini1...@gmail.com
wrote:
HI,

I tried running the dhrystone benchmark with some changes for
cache/mmu
set up.

However, the output shows a reduction in performance.
The time to run through the dhrystone has increased from 12 to 13
and
dhrystones run per second decreased.

According to this result, things were better with caches disabled.


I have been working on this since two days and could not figure out
an
improvement. Any pointers?

How did it do under Linux on the Pi2?


Thanks.



On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
krohini1...@gmail.com wrote:

Hi All,

I have to implement the cache coherency support for Cortex A7. But
for
A7 MPCore, unlike for A9, I am not able to find any register
description for the Snoop Control Unit from the TRM.

I need help here on how to proceed.

Additionally for A9 there is a single bit for A9 in the Auxiliary
Control Register which enables cache broadcast operations. The
register
format is different for A7 and again I am unable to find how to
achieve
the same for A7.

Thanks!





On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
joel.sherr...@oarcorp.com wrote:



On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

Hi,

I am working with the code for bsp hooks. I am referring to existing
ARM multicore bsp codes, zync mainly.

1. There are existing hooks for the raspberry pi. Where should the
code
for the  Pi2 hooks be added?

The Pi and Pi2 are remarkably similar so Pi2 should be placed inside
the Pi BSP directory.
There is already a Pi2 variant of that code built. But we know
specific
places where there
are variances. Depending on the scope of what is different, it can
be
as simple as
a cpp conditional in a .h to select a value or two implementations
of
a
single method
and the Makefile.am picking the right file to build based on the
board
variant.

The big question to always ask is: Is this specific to the Pi2 and
incompatible with the Pi?

Since the Pi BSP is still missing capabilities, it is likely code
common to both will
be added this summer. For example, did the mailbox interface change?
I
don't know
but would guess that it didn't.  Each new capability added needs
that
added.

And any differences need to be analyzed to pick the least intrusive
way
to provide
alternate implementations. Or enable special code like the Pi2 SMP
support which
is dependent on --enable-smp and being a Pi2.

2. Am I right in understanding that I will have to implement A7
specific functions as have been for A9? I am referring specifically
to
the arm-a9mpcore-start.h

Yes.

If the code is very similar between the a7 and a9, then a discussion
on devel@ should occur to decide the best way to minimize
duplication.

If you end up with a7 specific code, you should follow the location
and

naming patterns already established. That places it in
libbsp/arm/shared/...
so it can be used by any BSP with the right SMP core.


I am referring to existing codes to locate and get hold of what
needs
to be done in the hooks. However, being new to such implementations,
I
am taking longer to understand the details. Any suggestions that
might
help here are welcome

The answer will depend on the factors listed above. When code can
be shared, we want to share it across as many BSPs as makes sense.
When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2),
then
you want to find the way to account for the variation in the least
intrusive code way possible.

Thanks!

On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com
wrote:


Hi,

Excited to be a part of  this edition of GSoC! Thanks to everybody
for
helping me get here and congratulations to all the participating
students!

So, now getting to work, firstly I wish to know, specifically from
my
mentors, any changes that must be made to my proposed project or
schedule.

Secondly, are there any specifics for the development blog 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Rohini Kulkarni
HI,

I tried running the dhrystone benchmark with some changes for cache/mmu set
up.

However, the output shows a reduction in performance.
The time to run through the dhrystone has increased from 12 to 13 and
dhrystones run per second decreased.
According to this result, things were better with caches disabled.

I have been working on this since two days and could not figure out an
improvement. Any pointers?

Thanks.


On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni krohini1...@gmail.com
wrote:

 Hi All,

 I have to implement the cache coherency support for Cortex A7. But for A7
 MPCore, unlike for A9, I am not able to find any register description for
 the Snoop Control Unit from the TRM.
 I need help here on how to proceed.

 Additionally for A9 there is a single bit for A9 in the Auxiliary Control
 Register which enables cache broadcast operations. The register format is
 different for A7 and again I am unable to find how to achieve the same for
 A7.

 Thanks!




 On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill joel.sherr...@oarcorp.com
 wrote:



 On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

 Hi,

 I am working with the code for bsp hooks. I am referring to existing ARM
 multicore bsp codes, zync mainly.

 1. There are existing hooks for the raspberry pi. Where should the code
 for the  Pi2 hooks be added?

 The Pi and Pi2 are remarkably similar so Pi2 should be placed inside the
 Pi BSP directory.
 There is already a Pi2 variant of that code built. But we know specific
 places where there
 are variances. Depending on the scope of what is different, it can be as
 simple as
 a cpp conditional in a .h to select a value or two implementations of a
 single method
 and the Makefile.am picking the right file to build based on the board
 variant.

 The big question to always ask is: Is this specific to the Pi2 and
 incompatible with the Pi?

 Since the Pi BSP is still missing capabilities, it is likely code common
 to both will
 be added this summer. For example, did the mailbox interface change? I
 don't know
 but would guess that it didn't.  Each new capability added needs that
 added.

 And any differences need to be analyzed to pick the least intrusive way
 to provide
 alternate implementations. Or enable special code like the Pi2 SMP
 support which
 is dependent on --enable-smp and being a Pi2.

 2. Am I right in understanding that I will have to implement A7 specific
 functions as have been for A9? I am referring specifically to the
 arm-a9mpcore-start.h

 Yes.

 If the code is very similar between the a7 and a9, then a discussion
 on devel@ should occur to decide the best way to minimize duplication.

 If you end up with a7 specific code, you should follow the location and
 naming patterns already established. That places it in
 libbsp/arm/shared/...
 so it can be used by any BSP with the right SMP core.


  I am referring to existing codes to locate and get hold of what needs
 to be done in the hooks. However, being new to such implementations, I am
 taking longer to understand the details. Any suggestions that might help
 here are welcome

 The answer will depend on the factors listed above. When code can
 be shared, we want to share it across as many BSPs as makes sense.
 When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
 you want to find the way to account for the variation in the least
 intrusive code way possible.

 Thanks!
 On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com wrote:


  Hi,

  Excited to be a part of  this edition of GSoC! Thanks to everybody for
 helping me get here and congratulations to all the participating students!

  So, now getting to work, firstly I wish to know, specifically from my
 mentors, any changes that must be made to my proposed project or schedule.

 Secondly, are there any specifics for the development blog that we need
 to create for the project? Over time what is the blog expected to convey.

 Also, I have to create a new wiki page for my project as none exists. I
 want to know how to add one.

  --
  Rohini Kulkarni


 --
 Joel Sherrill, Ph.D. Director of Research  
 developmentjoel.sherr...@oarcorp.comOn-Line Applications Research
 Ask me about RTEMS: a free RTOS  Huntsville AL 35805
 Support Available(256) 722-9985




 --
 Rohini Kulkarni




-- 
Rohini Kulkarni
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Hesham ALMatary
On Tue, Jun 2, 2015 at 12:41 PM, Rohini Kulkarni krohini1...@gmail.com wrote:
 From what I saw, they have to be enabled separately. Cache/mmu are disabled
 upon reset.

For the existing Raspberry BSP [1] there's a code for MMU/Cache init,
however I don't know about Pi2 and where its code is.

[1] https://github.com/RTEMS/rtems/tree/master/c/src/lib/libbsp/arm/raspberrypi
 On 2 Jun 2015 16:59, Hesham ALMatary heshamelmat...@gmail.com wrote:

 Hi,

 Aren't the MMU/Caches enabled by default for RPi [1]?

 [1]
 https://github.com/RTEMS/rtems/blob/master/c/src/lib/libbsp/arm/shared/mminit.c

 On Tue, Jun 2, 2015 at 12:18 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 
 
  On June 2, 2015 7:01:21 AM EDT, Rohini Kulkarni krohini1...@gmail.com
  wrote:
 Dr. Joel,
 
 So we can't say something solely on the basis of this result?
 
  I don't think so. If Linux performs the same, then what you did is as
  good as it gets.
 
  However, if Linux is faster then some setting still isn't right.
 
  You need a reference measurement to have any confidence. It is possible
  you did something but didn't actually turn the cache (or all the cache) on.
 
 On 2 Jun 2015 16:28, Rohini Kulkarni krohini1...@gmail.com wrote:
 
 I have not run it under linux on pi2 yet. Will have to run and check
 the result.
 
 On 2 Jun 2015 16:16, Joel Sherrill joel.sherr...@oarcorp.com wrote:
 
 
 
 On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com
 wrote:
 HI,
 
 I tried running the dhrystone benchmark with some changes for
 cache/mmu
 set up.
 
 However, the output shows a reduction in performance.
 The time to run through the dhrystone has increased from 12 to 13 and
 dhrystones run per second decreased.
 
 According to this result, things were better with caches disabled.
 
 
 I have been working on this since two days and could not figure out an
 improvement. Any pointers?
 
 How did it do under Linux on the Pi2?
 
 
 Thanks.
 
 
 
 On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
 krohini1...@gmail.com wrote:
 
 Hi All,
 
 I have to implement the cache coherency support for Cortex A7. But for
 A7 MPCore, unlike for A9, I am not able to find any register
 description for the Snoop Control Unit from the TRM.
 
 I need help here on how to proceed.
 
 Additionally for A9 there is a single bit for A9 in the Auxiliary
 Control Register which enables cache broadcast operations. The
 register
 format is different for A7 and again I am unable to find how to
 achieve
 the same for A7.
 
 Thanks!
 
 
 
 
 
 On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
 joel.sherr...@oarcorp.com wrote:
 
 
 
 On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:
 
 Hi,
 
 I am working with the code for bsp hooks. I am referring to existing
 ARM multicore bsp codes, zync mainly.
 
 1. There are existing hooks for the raspberry pi. Where should the
 code
 for the  Pi2 hooks be added?
 
 The Pi and Pi2 are remarkably similar so Pi2 should be placed inside
 the Pi BSP directory.
 There is already a Pi2 variant of that code built. But we know
 specific
 places where there
 are variances. Depending on the scope of what is different, it can be
 as simple as
 a cpp conditional in a .h to select a value or two implementations of
 a
 single method
 and the Makefile.am picking the right file to build based on the board
 variant.
 
 The big question to always ask is: Is this specific to the Pi2 and
 incompatible with the Pi?
 
 Since the Pi BSP is still missing capabilities, it is likely code
 common to both will
 be added this summer. For example, did the mailbox interface change? I
 don't know
 but would guess that it didn't.  Each new capability added needs that
 added.
 
 And any differences need to be analyzed to pick the least intrusive
 way
 to provide
 alternate implementations. Or enable special code like the Pi2 SMP
 support which
 is dependent on --enable-smp and being a Pi2.
 
 2. Am I right in understanding that I will have to implement A7
 specific functions as have been for A9? I am referring specifically to
 the arm-a9mpcore-start.h
 
 Yes.
 
 If the code is very similar between the a7 and a9, then a discussion
 on devel@ should occur to decide the best way to minimize duplication.
 
 If you end up with a7 specific code, you should follow the location
 and
 
 naming patterns already established. That places it in
 libbsp/arm/shared/...
 so it can be used by any BSP with the right SMP core.
 
 
 I am referring to existing codes to locate and get hold of what needs
 to be done in the hooks. However, being new to such implementations, I
 am taking longer to understand the details. Any suggestions that might
 help here are welcome
 
 The answer will depend on the factors listed above. When code can
 be shared, we want to share it across as many BSPs as makes sense.
 When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
 you want to find the way to account for the variation in the least
 intrusive code way possible.
 
 Thanks!
 
 On 1 

Re: GSoC 2015: Raspberry Pi 2 Support

2015-06-02 Thread Joel Sherrill


On June 2, 2015 5:58:33 AM EDT, Rohini Kulkarni krohini1...@gmail.com wrote:
HI,

I tried running the dhrystone benchmark with some changes for cache/mmu
set up. 

However, the output shows a reduction in performance. 
The time to run through the dhrystone has increased from 12 to 13 and
dhrystones run per second decreased.

According to this result, things were better with caches disabled.


I have been working on this since two days and could not figure out an
improvement. Any pointers? 

How did it do under Linux on the Pi2?


Thanks.



On Thu, May 28, 2015 at 8:41 PM, Rohini Kulkarni
krohini1...@gmail.com wrote:

Hi All,

I have to implement the cache coherency support for Cortex A7. But for
A7 MPCore, unlike for A9, I am not able to find any register
description for the Snoop Control Unit from the TRM.

I need help here on how to proceed. 

Additionally for A9 there is a single bit for A9 in the Auxiliary
Control Register which enables cache broadcast operations. The register
format is different for A7 and again I am unable to find how to achieve
the same for A7.

Thanks!





On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill
joel.sherr...@oarcorp.com wrote:



On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

Hi,

I am working with the code for bsp hooks. I am referring to existing
ARM multicore bsp codes, zync mainly. 

1. There are existing hooks for the raspberry pi. Where should the code
for the  Pi2 hooks be added?

The Pi and Pi2 are remarkably similar so Pi2 should be placed inside
the Pi BSP directory.
There is already a Pi2 variant of that code built. But we know specific
places where there
are variances. Depending on the scope of what is different, it can be
as simple as
a cpp conditional in a .h to select a value or two implementations of a
single method
and the Makefile.am picking the right file to build based on the board
variant.

The big question to always ask is: Is this specific to the Pi2 and
incompatible with the Pi?

Since the Pi BSP is still missing capabilities, it is likely code
common to both will
be added this summer. For example, did the mailbox interface change? I
don't know
but would guess that it didn't.  Each new capability added needs that
added.

And any differences need to be analyzed to pick the least intrusive way
to provide
alternate implementations. Or enable special code like the Pi2 SMP
support which
is dependent on --enable-smp and being a Pi2.

2. Am I right in understanding that I will have to implement A7
specific functions as have been for A9? I am referring specifically to
the arm-a9mpcore-start.h

Yes.

If the code is very similar between the a7 and a9, then a discussion
on devel@ should occur to decide the best way to minimize duplication.

If you end up with a7 specific code, you should follow the location and

naming patterns already established. That places it in
libbsp/arm/shared/... 
so it can be used by any BSP with the right SMP core.


I am referring to existing codes to locate and get hold of what needs
to be done in the hooks. However, being new to such implementations, I
am taking longer to understand the details. Any suggestions that might
help here are welcome

The answer will depend on the factors listed above. When code can
be shared, we want to share it across as many BSPs as makes sense.
When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
you want to find the way to account for the variation in the least
intrusive code way possible.

Thanks!

On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com wrote:


Hi,

Excited to be a part of  this edition of GSoC! Thanks to everybody for
helping me get here and congratulations to all the participating
students!

So, now getting to work, firstly I wish to know, specifically from my
mentors, any changes that must be made to my proposed project or
schedule. 

Secondly, are there any specifics for the development blog that we need
to create for the project? Over time what is the blog expected to
convey. 

Also, I have to create a new wiki page for my project as none exists. I
want to know how to add one.

-- 

Rohini Kulkarni


-- Joel Sherrill, Ph.D. Director of Research  Development
joel.sherr...@oarcorp.com On-Line Applications Research Ask me about
RTEMS: a free RTOS Huntsville AL 35805 Support Available (256) 722-9985





-- 

Rohini Kulkarni




-- 

Rohini Kulkarni

--joel
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


RPI2 Cache configuration Was: Re: GSoC 2015: Raspberry Pi 2 Support

2015-05-29 Thread Gedare Bloom
On Thu, May 28, 2015 at 10:11 AM, Rohini Kulkarni krohini1...@gmail.com wrote:
 Hi All,

 I have to implement the cache coherency support for Cortex A7. But for A7
 MPCore, unlike for A9, I am not able to find any register description for
 the Snoop Control Unit from the TRM.
 I need help here on how to proceed.

Based on 10 minutes from searching through the online TRM, I guess(?)
you need to set the pins for the ACE configuration as described in
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464e/BABJECBF.html

See also: 
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0464f/BABBAAII.html

 Additionally for A9 there is a single bit for A9 in the Auxiliary Control
 Register which enables cache broadcast operations. The register format is
 different for A7 and again I am unable to find how to achieve the same for
 A7.

I think this is also controlled through the ACE.

 Thanks!



___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: GSoC 2015: Raspberry Pi 2 Support

2015-05-28 Thread Rohini Kulkarni
Hi All,

I have to implement the cache coherency support for Cortex A7. But for A7
MPCore, unlike for A9, I am not able to find any register description for
the Snoop Control Unit from the TRM.
I need help here on how to proceed.

Additionally for A9 there is a single bit for A9 in the Auxiliary Control
Register which enables cache broadcast operations. The register format is
different for A7 and again I am unable to find how to achieve the same for
A7.

Thanks!




On Tue, May 5, 2015 at 10:42 PM, Joel Sherrill joel.sherr...@oarcorp.com
wrote:



 On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

 Hi,

 I am working with the code for bsp hooks. I am referring to existing ARM
 multicore bsp codes, zync mainly.

 1. There are existing hooks for the raspberry pi. Where should the code
 for the  Pi2 hooks be added?

 The Pi and Pi2 are remarkably similar so Pi2 should be placed inside the
 Pi BSP directory.
 There is already a Pi2 variant of that code built. But we know specific
 places where there
 are variances. Depending on the scope of what is different, it can be as
 simple as
 a cpp conditional in a .h to select a value or two implementations of a
 single method
 and the Makefile.am picking the right file to build based on the board
 variant.

 The big question to always ask is: Is this specific to the Pi2 and
 incompatible with the Pi?

 Since the Pi BSP is still missing capabilities, it is likely code common
 to both will
 be added this summer. For example, did the mailbox interface change? I
 don't know
 but would guess that it didn't.  Each new capability added needs that
 added.

 And any differences need to be analyzed to pick the least intrusive way to
 provide
 alternate implementations. Or enable special code like the Pi2 SMP support
 which
 is dependent on --enable-smp and being a Pi2.

 2. Am I right in understanding that I will have to implement A7 specific
 functions as have been for A9? I am referring specifically to the
 arm-a9mpcore-start.h

 Yes.

 If the code is very similar between the a7 and a9, then a discussion
 on devel@ should occur to decide the best way to minimize duplication.

 If you end up with a7 specific code, you should follow the location and
 naming patterns already established. That places it in
 libbsp/arm/shared/...
 so it can be used by any BSP with the right SMP core.


  I am referring to existing codes to locate and get hold of what needs to
 be done in the hooks. However, being new to such implementations, I am
 taking longer to understand the details. Any suggestions that might help
 here are welcome

 The answer will depend on the factors listed above. When code can
 be shared, we want to share it across as many BSPs as makes sense.
 When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
 you want to find the way to account for the variation in the least
 intrusive code way possible.

 Thanks!
 On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com wrote:


  Hi,

  Excited to be a part of  this edition of GSoC! Thanks to everybody for
 helping me get here and congratulations to all the participating students!

  So, now getting to work, firstly I wish to know, specifically from my
 mentors, any changes that must be made to my proposed project or schedule.

 Secondly, are there any specifics for the development blog that we need
 to create for the project? Over time what is the blog expected to convey.

 Also, I have to create a new wiki page for my project as none exists. I
 want to know how to add one.

  --
  Rohini Kulkarni


 --
 Joel Sherrill, Ph.D. Director of Research  
 developmentjoel.sherr...@oarcorp.comOn-Line Applications Research
 Ask me about RTEMS: a free RTOS  Huntsville AL 35805
 Support Available(256) 722-9985




-- 
Rohini Kulkarni
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-05-05 Thread Joel Sherrill


On 5/5/2015 11:11 AM, Rohini Kulkarni wrote:

 Hi,

 I am working with the code for bsp hooks. I am referring to existing
 ARM multicore bsp codes, zync mainly. 

 1. There are existing hooks for the raspberry pi. Where should the
 code for the  Pi2 hooks be added?

The Pi and Pi2 are remarkably similar so Pi2 should be placed inside the
Pi BSP directory.
There is already a Pi2 variant of that code built. But we know specific
places where there
are variances. Depending on the scope of what is different, it can be as
simple as
a cpp conditional in a .h to select a value or two implementations of a
single method
and the Makefile.am picking the right file to build based on the board
variant.

The big question to always ask is: Is this specific to the Pi2 and
incompatible with the Pi?

Since the Pi BSP is still missing capabilities, it is likely code common
to both will
be added this summer. For example, did the mailbox interface change? I
don't know
but would guess that it didn't.  Each new capability added needs that added.

And any differences need to be analyzed to pick the least intrusive way
to provide
alternate implementations. Or enable special code like the Pi2 SMP
support which
is dependent on --enable-smp and being a Pi2.

 2. Am I right in understanding that I will have to implement A7
 specific functions as have been for A9? I am referring specifically to
 the arm-a9mpcore-start.h

Yes.

If the code is very similar between the a7 and a9, then a discussion
on devel@ should occur to decide the best way to minimize duplication.

If you end up with a7 specific code, you should follow the location and
naming patterns already established. That places it in
libbsp/arm/shared/...
so it can be used by any BSP with the right SMP core.


 I am referring to existing codes to locate and get hold of what needs
 to be done in the hooks. However, being new to such implementations, I
 am taking longer to understand the details. Any suggestions that might
 help here are welcome

The answer will depend on the factors listed above. When code can
be shared, we want to share it across as many BSPs as makes sense.
When it is unique to a specific BSP **variant** (e.g. Pi vs Pi2), then
you want to find the way to account for the variation in the least
intrusive code way possible.

 Thanks!

 On 1 May 2015 12:45, Rohini Kulkarni krohini1...@gmail.com
 mailto:krohini1...@gmail.com wrote:


 Hi,

 Excited to be a part of  this edition of GSoC! Thanks to everybody
 for helping me get here and congratulations to all the
 participating students!

 So, now getting to work, firstly I wish to know, specifically from
 my mentors, any changes that must be made to my proposed project
 or schedule.

 Secondly, are there any specifics for the development blog that we
 need to create for the project? Over time what is the blog
 expected to convey.

 Also, I have to create a new wiki page for my project as none
 exists. I want to know how to add one.

 -- 
 Rohini Kulkarni


-- 
Joel Sherrill, Ph.D. Director of Research  Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985

___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: GSoC 2015: Raspberry Pi 2 Support

2015-05-01 Thread Gedare Bloom
On Fri, May 1, 2015 at 3:15 AM, Rohini Kulkarni krohini1...@gmail.com wrote:

 Hi,

 Excited to be a part of  this edition of GSoC! Thanks to everybody for
 helping me get here and congratulations to all the participating students!

 So, now getting to work, firstly I wish to know, specifically from my
 mentors, any changes that must be made to my proposed project or schedule.

Get yourself an rpi2 working first, and start to prototype the code
for smp support by copying from another bsp like the zynq.

 Secondly, are there any specifics for the development blog that we need to
 create for the project? Over time what is the blog expected to convey.

The blog should help you to disseminate your design notes, progress,
and instructions on how to reproduce your testing. The blog is a
useful place for you to put lots of details about your project to have
a single consolidated record about it.

 Also, I have to create a new wiki page for my project as none exists. I want
 to know how to add one.

You can create a TracLink to a wiki page by editing the Tracking
Table, and then click the link to the page. You should create it under
GSoC/2015/ folder with a suitable name for your project, e.g.
/GSoC/2015/RaspberryPi2Support

Gedare

 --
 Rohini Kulkarni

 ___
 devel mailing list
 devel@rtems.org
 http://lists.rtems.org/mailman/listinfo/devel
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel